PATCH: Using BRIN indexes for sorted output

Started by Tomas Vondraabout 3 years ago60 messages

tomas.vondra@enterprisedb.com

about 3 years ago

1 attachment(s)

Hi,

There have been a couple discussions about using BRIN indexes for
sorting - in fact this was mentioned even in the "Improving Indexing
Performance" unconference session this year (don't remember by whom).
But I haven't seen any patches, so here's one.

The idea is that we can use information about ranges to split the table
into smaller parts that can be sorted in smaller chunks. For example if
you have a tiny 2MB table with two ranges, with values in [0,100] and
[101,200] intervals, then it's clear we can sort the first range, output
tuples, and then sort/output the second range.

The attached patch builds "BRIN Sort" paths/plans, closely resembling
index scans, only for BRIN indexes. And this special type of index scan
does what was mentioned above - incrementally sorts the data. It's a bit
more complicated because of overlapping ranges, ASC/DESC, NULL etc.

This is disabled by default, using a GUC enable_brinsort (you may need
to tweak other GUCs to disable parallel plans etc.).

A trivial example, demonstrating the benefits:

create table t (a int) with (fillfactor = 10);
insert into t select i from generate_series(1,10000000) s(i);

First, a simple LIMIT query:

explain (analyze, costs off) select * from t order by a limit 10;

QUERY PLAN
------------------------------------------------------------------------
Limit (actual time=1879.768..1879.770 rows=10 loops=1)
-> Sort (actual time=1879.767..1879.768 rows=10 loops=1)
Sort Key: a
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on t
(actual time=0.007..1353.110 rows=10000000 loops=1)
Planning Time: 0.083 ms
Execution Time: 1879.786 ms
(7 rows)

QUERY PLAN
------------------------------------------------------------------------
Limit (actual time=1.217..1.219 rows=10 loops=1)
-> BRIN Sort using t_a_idx on t
(actual time=1.216..1.217 rows=10 loops=1)
Sort Key: a
Planning Time: 0.084 ms
Execution Time: 1.234 ms
(5 rows)

That's a pretty nice improvement - of course, this is thanks to having a
perfectly sequential, and the difference can be almost arbitrary by
making the table smaller/larger. Similarly, if the table gets less
sequential (making ranges to overlap), the BRIN plan will be more
expensive. Feel free to experiment with other data sets.

However, not only the LIMIT queries can improve - consider a sort of the
whole table:

test=# explain (analyze, costs off) select * from t order by a;

QUERY PLAN
-------------------------------------------------------------------------
Sort (actual time=2806.468..3487.213 rows=10000000 loops=1)
Sort Key: a
Sort Method: external merge Disk: 117528kB
-> Seq Scan on t (actual time=0.018..1498.754 rows=10000000 loops=1)
Planning Time: 0.110 ms
Execution Time: 3766.825 ms
(6 rows)

test=# explain (analyze, costs off) select * from t order by a;
QUERY PLAN

----------------------------------------------------------------------------------
BRIN Sort using t_a_idx on t (actual time=1.210..2670.875 rows=10000000
loops=1)
Sort Key: a
Planning Time: 0.073 ms
Execution Time: 2939.324 ms
(4 rows)

Right - not a huge difference, but still a nice 25% speedup, mostly due
to not having to spill data to disk and sorting smaller amounts of data.

There's a bunch of issues with this initial version of the patch,
usually described in XXX comments in the relevant places.6)

1) The paths are created in build_index_paths() because that's what
creates index scans (which the new path resembles). But that is expected
to produce IndexPath, not BrinSortPath, so it's not quite correct.
Should be somewhere "higher" I guess.

2) BRIN indexes don't have internal ordering, i.e. ASC/DESC and NULLS
FIRST/LAST does not really matter for them. The patch just generates
paths for all 4 combinations (or tries to). Maybe there's a better way.

3) I'm not quite sure the separation of responsibilities between
opfamily and opclass is optimal. I added a new amproc, but maybe this
should be split differently. At the moment only minmax indexes have
this, but adding this to minmax-multi should be trivial.

4) The state changes in nodeBrinSort is a bit confusing. Works, but may
need cleanup and refactoring. Ideas welcome.

5) The costing is essentially just plain cost_index. I have some ideas
about BRIN costing in general, which I'll post in a separate thread (as
it's not specific to this patch).

6) At the moment this only picks one of the index keys, specified in the
ORDER BY clause. I think we can generalize this to multiple keys, but
thinking about multi-key ranges was a bit too much for me. The good
thing is this nicely combines with IncrementalSort.

7) Only plain index keys for the ORDER BY keys, no expressions. Should
not be hard to fix, though.

8) Parallel version is not supported, but I think it shouldn't be
possible. Just make the leader build the range info, and then let the
workers to acquire/sort ranges and merge them by Gather Merge.

9) I was also thinking about leveraging other indexes to quickly
eliminate ranges that need to be sorted. The node does evaluate filter,
of course, but only after reading the tuple from the range. But imagine
we allow BrinSort to utilize BRIN indexes to evaluate the filter - in
that case we might skip many ranges entirely. Essentially like a bitmap
index scan does, except that building the bitmap incrementally with BRIN
is trivial - you can quickly check if a particular range matches or not.
With other indexes (e.g. btree) you essentially need to evaluate the
filter completely, and only then you can look at the bitmap. Which seems
rather against the idea of this patch, which is about low startup cost.
Of course, the condition might be very selective, but then you probably
can just fetch the matching tuples and do a Sort.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Allow-BRIN-indexes-to-produce-sorted-output-20221015.patchtext/x-patch; charset=UTF-8; name=0001-Allow-BRIN-indexes-to-produce-sorted-output-20221015.patchDownload

From 6d75cd243c107bc309958ecee98b085dfb7962ad Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 9 Oct 2022 11:33:37 +0200
Subject: [PATCH] Allow BRIN indexes to produce sorted output

Some BRIN indexes can be used to produce sorted output, by using the
range information to sort tuples incrementally. This is particularly
interesting for LIMIT queries, which only need to scan the first few
rows, and alternative plans (e.g. Seq Scan + Sort) have a very high
startup cost.

Of course, if there are e.g. BTREE indexes this is going to be slower,
but people are unlikely to have both index types on the same column.

This is disabled by default, use enable_brinsort GUC to enable it.
---
 src/backend/access/brin/brin_minmax.c   |  149 +++
 src/backend/commands/explain.c          |   44 +
 src/backend/executor/Makefile           |    1 +
 src/backend/executor/execProcnode.c     |   10 +
 src/backend/executor/nodeBrinSort.c     | 1538 +++++++++++++++++++++++
 src/backend/optimizer/path/costsize.c   |  254 ++++
 src/backend/optimizer/path/indxpath.c   |  197 +++
 src/backend/optimizer/path/pathkeys.c   |   50 +
 src/backend/optimizer/plan/createplan.c |  189 +++
 src/backend/optimizer/plan/setrefs.c    |   19 +
 src/backend/optimizer/util/pathnode.c   |   59 +
 src/backend/utils/misc/guc_tables.c     |   10 +
 src/include/access/brin.h               |   20 +
 src/include/access/brin_internal.h      |    1 +
 src/include/catalog/pg_amproc.dat       |   64 +
 src/include/catalog/pg_opclass.dat      |    2 +-
 src/include/catalog/pg_proc.dat         |    3 +
 src/include/executor/nodeBrinSort.h     |   47 +
 src/include/nodes/execnodes.h           |   69 +
 src/include/nodes/pathnodes.h           |   11 +
 src/include/nodes/plannodes.h           |   26 +
 src/include/optimizer/cost.h            |    3 +
 src/include/optimizer/pathnode.h        |   11 +
 src/include/optimizer/paths.h           |    3 +
 24 files changed, 2779 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/executor/nodeBrinSort.c
 create mode 100644 src/include/executor/nodeBrinSort.h

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 9e8a8e056cc..0e6ba0893df 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -10,12 +10,20 @@
  */
 #include "postgres.h"
 
+#include "access/brin.h"
 #include "access/brin_internal.h"
+#include "access/brin_revmap.h"
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
@@ -253,6 +261,147 @@ brin_minmax_union(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+typedef struct BrinOpaque
+{
+	BlockNumber bo_pagesPerRange;
+	BrinRevmap *bo_rmAccess;
+	BrinDesc   *bo_bdesc;
+} BrinOpaque;
+
+Datum
+brin_minmax_ranges(PG_FUNCTION_ARGS)
+{
+	IndexScanDesc	scan = (IndexScanDesc) PG_GETARG_POINTER(0);
+	AttrNumber		attnum = PG_GETARG_INT16(1);
+	BrinOpaque *opaque;
+	Relation	indexRel;
+	Relation	heapRel;
+	BlockNumber nblocks;
+	BlockNumber	nranges;
+	BlockNumber	heapBlk;
+	Oid			heapOid;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BrinRanges  *ranges;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+
+	/*
+	 * Determine how many BRIN ranges could there be, allocate space and read
+	 * all the min/max values.
+	 */
+	opaque = (BrinOpaque *) scan->opaque;
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	indexRel = bdesc->bd_index;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	heapOid = IndexGetRelation(RelationGetRelid(indexRel), false);
+	heapRel = table_open(heapOid, AccessShareLock);
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+	table_close(heapRel, AccessShareLock);
+
+	/*
+	 * How many ranges can there be? We simply look at the number of pages,
+	 * divide it by the pages_per_range.
+	 *
+	 * XXX We need to be careful not to overflow nranges, so we just divide
+	 * and then maybe add 1 for partial ranges.
+	 */
+	nranges = (nblocks / pagesPerRange);
+	if (nblocks % pagesPerRange != 0)
+		nranges += 1;
+
+	/* allocate for space, and also for the alternative ordering */
+	ranges = palloc0(offsetof(BrinRanges, ranges) + nranges * sizeof(BrinRange));
+	ranges->nranges = 0;
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+		BrinRange  *range = &ranges->ranges[ranges->nranges];
+
+		ranges->nranges++;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   scan->xs_snapshot);
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		range->blkno_start = heapBlk;
+		range->blkno_end = heapBlk + (pagesPerRange - 1);
+
+		/*
+		 * Ranges with no indexed tuple may contain anything.
+		 */
+		if (!gottuple)
+		{
+			range->not_summarized = true;
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/*
+				 * Placeholder tuples are treated as if not populated.
+				 *
+				 * XXX Is this correct?
+				 */
+				range->not_summarized = true;
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				range->has_nulls = bval->bv_hasnulls;
+				range->all_nulls = bval->bv_allnulls;
+
+				if (!bval->bv_allnulls)
+				{
+					/* FIXME copy the values, if needed (e.g. varlena) */
+					range->min_value = bval->bv_values[0];
+					range->max_value = bval->bv_values[1];
+				}
+			}
+		}
+	}
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	PG_RETURN_POINTER(ranges);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c6601..e15b29246b1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -85,6 +85,8 @@ static void show_sort_keys(SortState *sortstate, List *ancestors,
 						   ExplainState *es);
 static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
+static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
+							   ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1100,6 +1102,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexScan:
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
+		case T_BrinSort:
 		case T_TidScan:
 		case T_TidRangeScan:
 		case T_SubqueryScan:
@@ -1262,6 +1265,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_IndexOnlyScan:
 			pname = sname = "Index Only Scan";
 			break;
+		case T_BrinSort:
+			pname = sname = "BRIN Sort";
+			break;
 		case T_BitmapIndexScan:
 			pname = sname = "Bitmap Index Scan";
 			break;
@@ -1508,6 +1514,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainScanTarget((Scan *) indexonlyscan, es);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *brinsort = (BrinSort *) plan;
+
+				ExplainIndexScanDetails(brinsort->indexid,
+										brinsort->indexorderdir,
+										es);
+				ExplainScanTarget((Scan *) brinsort, es);
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *bitmapindexscan = (BitmapIndexScan *) plan;
@@ -1790,6 +1806,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainPropertyFloat("Heap Fetches", NULL,
 									 planstate->instrument->ntuples2, 0, es);
 			break;
+		case T_BrinSort:
+			show_scan_qual(((BrinSort *) plan)->indexqualorig,
+						   "Index Cond", planstate, ancestors, es);
+			if (((BrinSort *) plan)->indexqualorig)
+				show_instrumentation_count("Rows Removed by Index Recheck", 2,
+										   planstate, es);
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			break;
 		case T_BitmapIndexScan:
 			show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
@@ -2389,6 +2417,21 @@ show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 						 ancestors, es);
 }
 
+/*
+ * Show the sort keys for a BRIN Sort node.
+ */
+static void
+show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSort	   *plan = (BrinSort *) sortstate->ss.ps.plan;
+
+	show_sort_group_keys((PlanState *) sortstate, "Sort Key",
+						 plan->numCols, 0, plan->sortColIdx,
+						 plan->sortOperators, plan->collations,
+						 plan->nullsFirst,
+						 ancestors, es);
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
@@ -3812,6 +3855,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
+		case T_BrinSort:
 			/* Assert it's on a real relation */
 			Assert(rte->rtekind == RTE_RELATION);
 			objectname = get_rel_name(rte->relid);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 11118d0ce02..bcaa2ce8e21 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -38,6 +38,7 @@ OBJS = \
 	nodeBitmapHeapscan.o \
 	nodeBitmapIndexscan.o \
 	nodeBitmapOr.o \
+	nodeBrinSort.o \
 	nodeCtescan.o \
 	nodeCustom.o \
 	nodeForeignscan.o \
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 36406c3af57..4a6dc3f263c 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -79,6 +79,7 @@
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
+#include "executor/nodeBrinSort.h"
 #include "executor/nodeCtescan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -226,6 +227,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														 estate, eflags);
 			break;
 
+		case T_BrinSort:
+			result = (PlanState *) ExecInitBrinSort((BrinSort *) node,
+													estate, eflags);
+			break;
+
 		case T_BitmapIndexScan:
 			result = (PlanState *) ExecInitBitmapIndexScan((BitmapIndexScan *) node,
 														   estate, eflags);
@@ -639,6 +645,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIndexOnlyScan((IndexOnlyScanState *) node);
 			break;
 
+		case T_BrinSortState:
+			ExecEndBrinSort((BrinSortState *) node);
+			break;
+
 		case T_BitmapIndexScanState:
 			ExecEndBitmapIndexScan((BitmapIndexScanState *) node);
 			break;
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
new file mode 100644
index 00000000000..ad46169aee3
--- /dev/null
+++ b/src/backend/executor/nodeBrinSort.c
@@ -0,0 +1,1538 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.c
+ *	  Routines to support sorted scan of relations using a BRIN index
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * FIXME handling of other brin opclasses (minmax-multi)
+ *
+ * FIXME improve costing
+ *
+ *
+ * Improvement ideas:
+ *
+ * 1) multiple tuplestores for overlapping ranges
+ *
+ * When there are many overlapping ranges (so that maxval > current.maxval),
+ * we're loading all the "future" tuples into a new tuplestore. However, if
+ * there are multiple such ranges (imagine ranges "shifting" by 10%, which
+ * gives us 9 more ranges), we know in the next round we'll only need rows
+ * until the next maxval. We'll not sort these rows, but we'll still shuffle
+ * them around until we get to the proper range (so about 10x each row).
+ * Maybe we should pre-allocate the tuplestores (or maybe even tuplesorts)
+ * for future ranges, and route the tuples to the correct one? Maybe we
+ * could be a bit smarter and discard tuples once we have enough rows for
+ * the preceding ranges (say, with LIMIT queries). We'd also need to worry
+ * about work_mem, though - we can't just use many tuplestores, each with
+ * whole work_mem. So we'd probably use e.g. work_mem/2 for the next one,
+ * and then /4, /8 etc. for the following ones. That's work_mem in total.
+ * And there'd need to be some limit on number of tuplestores, I guess.
+ *
+ * 2) handling NULL values
+ *
+ * We need to handle NULLS FIRST / NULLS LAST cases. The question is how
+ * to do that - the easiest way is to simply do a separate scan of ranges
+ * that might contain NULL values, processing just rows with NULLs, and
+ * discarding other rows. And then process non-NULL values as currently.
+ * The NULL scan would happen before/after this regular phase.
+ *
+ * Byt maybe we could be smarter, and not do separate scans. When reading
+ * a page, we might stash the tuple in a tuplestore, so that we can read
+ * it the next round. Obviously, this might be expensive if we need to
+ * keep too many rows, so the tuplestore would grow too large - in that
+ * case it might be better to just do the two scans.
+ *
+ * 3) parallelism
+ *
+ * Presumably we could do a parallel version of this. The leader or first
+ * worker would prepare the range information, and the workers would then
+ * grab ranges (in a kinda round robin manner), sort them independently,
+ * and then the results would be merged by Gather Merge.
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeBrinSort.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *		ExecBrinSort			scans a relation using an index
+ *		IndexNext				retrieve next tuple using index
+ *		ExecInitBrinSort		creates and initializes state info.
+ *		ExecReScanBrinSort		rescans the indexed relation.
+ *		ExecEndBrinSort			releases all storage.
+ *		ExecBrinSortMarkPos		marks scan position.
+ *		ExecBrinSortRestrPos	restores scan position.
+ *		ExecBrinSortEstimate	estimates DSM space needed for parallel index scan
+ *		ExecBrinSortInitializeDSM initialize DSM for parallel BrinSort
+ *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
+ *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
+ */
+#include "postgres.h"
+
+#include "access/brin.h"
+#include "access/brin_internal.h"
+#include "access/nbtree.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
+#include "executor/execdebug.h"
+#include "executor/nodeBrinSort.h"
+#include "lib/pairingheap.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/array.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+/*
+ * When an ordering operator is used, tuples fetched from the index that
+ * need to be reordered are queued in a pairing heap, as ReorderTuples.
+ */
+typedef struct
+{
+	pairingheap_node ph_node;
+	HeapTuple	htup;
+	Datum	   *orderbyvals;
+	bool	   *orderbynulls;
+} ReorderTuple;
+
+static TupleTableSlot *IndexNext(BrinSortState *node);
+static bool IndexRecheck(BrinSortState *node, TupleTableSlot *slot);
+static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
+
+/* do various consistency checks */
+static void
+AssertCheckRanges(BrinSortState *node)
+{
+#ifdef USE_ASSERT_CHECKING
+
+	/* the primary range index has to be valid */
+	Assert((0 <= node->bs_next_range) &&
+		   (node->bs_next_range <= node->bs_nranges));
+
+	/* the intersect range index has to be valid*/
+	Assert((0 <= node->bs_next_range_intersect) &&
+		   (node->bs_next_range_intersect <= node->bs_nranges));
+
+	/* all the ranges up to bs_next_range should be marked as processed */
+	for (int i = 0; i < node->bs_next_range; i++)
+	{
+		BrinSortRange *range = &node->bs_ranges[i];
+		Assert(range->processed);
+	}
+
+	/* same for bs_next_range_intersect */
+	for (int i = 0; i < node->bs_next_range_intersect; i++)
+	{
+		BrinSortRange *range = node->bs_ranges_minval[i];
+		Assert(range->processed);
+	}
+#endif
+}
+
+/*
+ * brinsort_start_tidscan
+ *		Start scanning tuples from a given page range.
+ *
+ * We open a TID range scan for the given range, and initialize the tuplesort.
+ * Optionally, we update the watermark (with either high/low value). We only
+ * need to do this for the main page range, not for the intersecting ranges.
+ *
+ * XXX Maybe we should initialize the tidscan only once, and then do rescan
+ * for the following ranges? And similarly for the tuplesort?
+ */
+static void
+brinsort_start_tidscan(BrinSortState *node, BrinSortRange *range,
+					   bool update_watermark, bool mark_processed)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate = node->ss.ps.state;
+
+	/*
+	 * When scanning the range during NULL processing, in which case the range
+	 * might be already marked as processed (for NULLS LAST). So we only check
+	 * the page is not alreayd marked as processed when we're supposed to mark
+	 * it as processed.
+	 */
+	Assert(!(mark_processed && range->processed));
+
+	/* There must not be any TID scan in progress yet. */
+	Assert(node->ss.ss_currentScanDesc == NULL);
+
+	/* Initialize the TID range scan, for the provided block range. */
+	if (node->ss.ss_currentScanDesc == NULL)
+	{
+		TableScanDesc		tscandesc;
+		ItemPointerData		mintid,
+							maxtid;
+
+		ItemPointerSetBlockNumber(&mintid, range->blkno_start);
+		ItemPointerSetOffsetNumber(&mintid, 0);
+
+		ItemPointerSetBlockNumber(&maxtid, range->blkno_end);
+		ItemPointerSetOffsetNumber(&maxtid, MaxHeapTuplesPerPage);
+
+		elog(DEBUG1, "loading range blocks [%u, %u]",
+			 range->blkno_start, range->blkno_end);
+
+		tscandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 &mintid, &maxtid);
+		node->ss.ss_currentScanDesc = tscandesc;
+	}
+
+	if (node->bs_tuplesortstate == NULL)
+	{
+		TupleDesc	tupDesc = RelationGetDescr(node->ss.ss_currentRelation);
+
+		node->bs_tuplesortstate = tuplesort_begin_heap(tupDesc,
+													plan->numCols,
+													plan->sortColIdx,
+													plan->sortOperators,
+													plan->collations,
+													plan->nullsFirst,
+													work_mem,
+													NULL,
+													TUPLESORT_NONE);
+	}
+
+	if (node->bs_tuplestore == NULL)
+	{
+		node->bs_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+	}
+
+	/*
+	 * Remember maximum value for the current range (but not when
+	 * processing overlapping ranges). We only do this during the
+	 * regular tuple processing, not when scanning NULL values.
+	 *
+	 * We use the larger value, according to the sort operator, so that this
+	 * gets the right value even for DESC ordering (in which case the lower
+	 * boundary will be evaluated as "greater").
+	 *
+	 * XXX Could also use the scan direction, like in other places.
+	 */
+	if (update_watermark)
+	{
+		int cmp = ApplySortComparator(range->min_value, false,
+									  range->max_value, false,
+									  &node->bs_sortsupport);
+
+		if (cmp < 0)
+			node->bs_watermark = range->max_value;
+		else
+			node->bs_watermark = range->min_value;
+	}
+
+	/* Maybe mark the range as processed. */
+	range->processed |= mark_processed;
+}
+
+/*
+ * brinsort_end_tidscan
+ *		Finish the TID range scan.
+ */
+static void
+brinsort_end_tidscan(BrinSortState *node)
+{
+	/* get the first range, read all tuples using a tid range scan */
+	if (node->ss.ss_currentScanDesc != NULL)
+	{
+		table_endscan(node->ss.ss_currentScanDesc);
+		node->ss.ss_currentScanDesc = NULL;
+	}
+}
+
+/*
+ * brinsort_load_tuples
+ *		Load tuples from the TID range scan, add them to tuplesort/store.
+ *
+ * When called for the "current" range, we don't need to check the watermark,
+ * we know the tuple goes into the tuplesort. So with check_watermark we
+ * skip the comparator call to save CPU cost.
+ */
+static void
+brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_processing)
+{
+	BrinSort	   *plan = (BrinSort *) node->ss.ps.plan;
+	TableScanDesc	scan = node->ss.ss_currentScanDesc;
+	EState		   *estate;
+	ScanDirection	direction;
+	TupleTableSlot *slot;
+
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+
+	slot = node->ss.ss_ScanTupleSlot;
+
+	/*
+	 * Read tuples, evaluate the filer (so that we don't keep tuples only to
+	 * discard them later), and decide if it goes into the current range
+	 * (tuplesort) or overflow (tuplestore).
+	 */
+	while (table_scan_getnextslot_tidrange(scan, direction, slot))
+	{
+		ExprContext *econtext;
+		ExprState  *qual;
+
+		/*
+		 * Fetch data from node
+		 */
+		qual = node->bs_qual;
+		econtext = node->ss.ps.ps_ExprContext;
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 *
+		 * XXX Done here, because in ExecScan we'll get different slot type
+		 * (minimal tuple vs. buffered tuple). Scan expects slot while reading
+		 * from the table (like here), but we're stashing it into a tuplesort.
+		 *
+		 * XXX Maybe we could eliminate many tuples by leveraging the BRIN
+		 * range, by executing the consistent function. But we don't have
+		 * the qual in appropriate format at the moment, so we'd preprocess
+		 * the keys similarly to bringetbitmap(). In which case we should
+		 * probably evaluate the stuff while building the ranges? Although,
+		 * if the "consistent" function is expensive, it might be cheaper
+		 * to do that incrementally, as we need the ranges. Would be a win
+		 * for LIMIT queries, for example.
+		 *
+		 * XXX However, maybe we could also leverage other bitmap indexes,
+		 * particularly for BRIN indexes because that makes it simpler to
+		 * eliminage the ranges incrementally - we know which ranges to
+		 * load from the index, while for other indexes (e.g. btree) we
+		 * have to read the whole index and build a bitmap in order to have
+		 * a bitmap for any range. Although, if the condition is very
+		 * selective, we may need to read only a small fraction of the
+		 * index, so maybe that's OK.
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			int		cmp = 0;	/* matters for check_watermark=false */
+			Datum	value;
+			bool	isnull;
+
+			value = slot_getattr(slot, plan->sortColIdx[0], &isnull);
+
+			/*
+			 * FIXME Not handling NULLS for now, we need to stash them into
+			 * a separate tuplestore (so that we can output them first or
+			 * last), and then skip them in the regular processing?
+			 */
+			if (null_processing)
+			{
+				/* Stash it to the tuplestore (when NULL, or ignore
+				 * it (when not-NULL). */
+				if (isnull)
+					tuplestore_puttupleslot(node->bs_tuplestore, slot);
+
+				/* NULL or not, we're done */
+				continue;
+			}
+
+			/* we're not processing NULL values, so ignore NULLs */
+			if (isnull)
+				continue;
+
+			/*
+			 * Otherwise compare to watermark, and stash it either to the
+			 * tuplesort or tuplestore.
+			 */
+			if (check_watermark)
+				cmp = ApplySortComparator(value, false,
+										  node->bs_watermark, false,
+										  &node->bs_sortsupport);
+
+			if (cmp <= 0)
+				tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+			else
+				tuplestore_puttupleslot(node->bs_tuplestore, slot);
+		}
+
+		ExecClearTuple(slot);
+	}
+
+	ExecClearTuple(slot);
+}
+
+/*
+ * brinsort_load_spill_tuples
+ *		Load tuples from the spill tuplestore, and either stash them into
+ *		a tuplesort or a new tuplestore.
+ *
+ * After processing the last range, we want to process all remaining ranges,
+ * so with check_watermark=false we skip the check.
+ */
+static void
+brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	Tuplestorestate *tupstore;
+	TupleTableSlot *slot;
+
+	if (node->bs_tuplestore == NULL)
+		return;
+
+	/* start scanning the existing tuplestore (XXX needed?) */
+	tuplestore_rescan(node->bs_tuplestore);
+
+	/*
+	 * Create a new tuplestore, for tuples that exceed the watermark and so
+	 * should not be included in the current sort.
+	 */
+	tupstore = tuplestore_begin_heap(false, false, work_mem);
+
+	/*
+	 * We need a slot for minimal tuples. The scan slot uses buffered tuples,
+	 * so it'd trigger an error in the loop.
+	 */
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(node->ss.ss_currentRelation),
+									&TTSOpsMinimalTuple);
+
+	while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+	{
+		int		cmp = 0;	/* matters for check_watermark=false */
+		bool	isnull;
+		Datum	value;
+
+		value = slot_getattr(slot, plan->sortColIdx[0], &isnull);
+
+		/* We shouldn't have NULL values in the spill, at least not now. */
+		Assert(!isnull);
+
+		if (check_watermark)
+			cmp = ApplySortComparator(value, false,
+									  node->bs_watermark, false,
+									  &node->bs_sortsupport);
+
+		if (cmp <= 0)
+			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+		else
+			tuplestore_puttupleslot(tupstore, slot);
+	}
+
+	/*
+	 * Discard the existing tuplestore (that we just processed), use the new
+	 * one instead.
+	 */
+	tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = tupstore;
+
+	ExecDropSingleTupleTableSlot(slot);
+}
+
+/*
+ * brinsort_load_intersecting_ranges
+ *		Load ranges intersecting with the current watermark.
+ *
+ * This does not increment bs_next_range, but bs_next_range_intersect.
+ */
+static void
+brinsort_load_intersecting_ranges(BrinSort *plan, BrinSortState *node)
+{
+	/* load intersecting ranges */
+	for (int i = node->bs_next_range_intersect; i < node->bs_nranges; i++)
+	{
+		int	cmp;
+		BrinSortRange  *range = node->bs_ranges_minval[i];
+
+		/* skip already processed ranges */
+		if (range->processed)
+			continue;
+
+		/*
+		 * Abort on the first all-null or not-summarized range. These are
+		 * intentionally kept at the end, but don't intersect with anything.
+		 */
+		if (range->all_nulls || range->not_summarized)
+			break;
+
+		if (ScanDirectionIsForward(plan->indexorderdir))
+			cmp = ApplySortComparator(range->min_value, false,
+									  node->bs_watermark, false,
+									  &node->bs_sortsupport);
+		else
+			cmp = ApplySortComparator(range->max_value, false,
+									  node->bs_watermark, false,
+									  &node->bs_sortsupport);
+
+		/*
+		 * No possible overlap, so break, we know all following ranges have
+		 * a higher minval and thus can't intersect either.
+		 */
+		if (cmp > 0)
+			break;
+
+		node->bs_next_range_intersect++;
+
+		elog(DEBUG1, "loading intersecting range %d (%u,%u) [%ld,%ld] %ld", i,
+					  range->blkno_start, range->blkno_end,
+					  range->min_value, range->max_value,
+					  node->bs_watermark);
+
+		/* load tuples from the rage, check the watermark */
+		brinsort_start_tidscan(node, range, false, true);
+		brinsort_load_tuples(node, true, false);
+		brinsort_end_tidscan(node);
+	}
+}
+
+/*
+ * brinsort_load_unsummarized_ranges
+ *		Load ranges that don't have a proper summary, so we don't know
+ *		what values are in them (might be even NULL values).
+ *
+ * We simply load them into the spill tuplestore, because that's the
+ * best thing we can do. We ignore NULL values though - those are handled
+ * in a separate step.
+ */
+static void
+brinsort_load_unsummarized_ranges(BrinSort *plan, BrinSortState *node)
+{
+	/* Should be called only once, right after the first range. */
+	Assert(node->bs_next_range == 1);
+
+	/* load unsummarized ranges */
+	for (int i = 0; i < node->bs_nranges; i++)
+	{
+		BrinSortRange  *range = node->bs_ranges_minval[i];
+
+		/* skip already processed ranges (there should be just one) */
+		if (range->processed)
+			continue;
+
+		/* we're interested only in not-summarized ranges */
+		if (!range->not_summarized)
+			continue;
+
+		elog(DEBUG1, "loading not-summarized range %d (%u,%u) [%ld,%ld] %ld", i,
+					  range->blkno_start, range->blkno_end,
+					  range->min_value, range->max_value,
+					  node->bs_watermark);
+
+		/*
+		 * Load tuples from the rage, check the watermark and mark the
+		 * ranges as processed.
+		 */
+		brinsort_start_tidscan(node, range, false, true);
+		brinsort_load_tuples(node, true, false);
+		brinsort_end_tidscan(node);
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		IndexNext
+ *
+ *		Retrieve a tuple from the BrinSort node's currentRelation
+ *		using the index specified in the BrinSortState information.
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+IndexNext(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate;
+	ScanDirection direction;
+	IndexScanDesc scandesc;
+	TupleTableSlot *slot;
+	bool		nullsFirst;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+
+	/* flip direction if this is an overall backward scan */
+	/* XXX For BRIN indexes this is always forward direction */
+	// if (ScanDirectionIsBackward(((BrinSort *) node->ss.ps.plan)->indexorderdir))
+	if (false)
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	scandesc = node->iss_ScanDesc;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	nullsFirst = plan->nullsFirst[0];
+
+	if (scandesc == NULL)
+	{
+		/*
+		 * We reach here if the index scan is not parallel, or if we're
+		 * serially executing an index scan that was planned to be parallel.
+		 */
+		scandesc = index_beginscan(node->ss.ss_currentRelation,
+								   node->iss_RelationDesc,
+								   estate->es_snapshot,
+								   node->iss_NumScanKeys,
+								   node->iss_NumOrderByKeys);
+
+		node->iss_ScanDesc = scandesc;
+
+		/*
+		 * If no run-time keys to calculate or they are ready, go ahead and
+		 * pass the scankeys to the index AM.
+		 */
+		if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+			index_rescan(scandesc,
+						 node->iss_ScanKeys, node->iss_NumScanKeys,
+						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+
+		/*
+		 * Load info about BRIN ranges, sort them to match the desired ordering.
+		 */
+		ExecInitBrinSortRanges(plan, node);
+		node->bs_next_range = 0;
+		node->bs_next_range_intersect = 0;
+		node->bs_next_range_nulls = 0;
+		node->bs_phase = BRINSORT_START;
+
+
+		/* dump ranges for debugging */
+		for (int i = 0; i < node->bs_nranges; i++)
+		{
+			elog(DEBUG1, "%d => (%u,%u) [%ld,%ld]", i,
+				 node->bs_ranges[i].blkno_start,
+				 node->bs_ranges[i].blkno_end,
+				 node->bs_ranges[i].min_value,
+				 node->bs_ranges[i].max_value);
+		}
+
+		for (int i = 0; i < node->bs_nranges; i++)
+		{
+			elog(DEBUG1, "minval %d => (%u,%u) [%ld,%ld]", i,
+				 node->bs_ranges_minval[i]->blkno_start,
+				 node->bs_ranges_minval[i]->blkno_end,
+				 node->bs_ranges_minval[i]->min_value,
+				 node->bs_ranges_minval[i]->max_value);
+		}
+	}
+
+	/*
+	 * ok, now that we have what we need, fetch the next tuple.
+	 */
+	while (node->bs_phase != BRINSORT_FINISHED)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		elog(DEBUG1, "phase = %d", node->bs_phase);
+
+		AssertCheckRanges(node);
+
+		switch (node->bs_phase)
+		{
+			case BRINSORT_START:
+				/*
+				 * If we have NULLS FIRST, move to that stage. Otherwise
+				 * start scanning regular ranges.
+				 */
+				node->bs_phase = (nullsFirst) ? BRINSORT_LOAD_NULLS : BRINSORT_LOAD_RANGE;
+
+				break;
+
+			case BRINSORT_LOAD_RANGE:
+				{
+					BrinSortRange *range;
+
+					elog(DEBUG1, "phase = LOAD_RANGE %d of %d", node->bs_next_range, node->bs_nranges);
+
+					/*
+					 * Some of the ranges might intersect with already processed
+					 * range and thus have already been processed, so skip them.
+					 *
+					 * FIXME Should this care about all-null / not_summarized?
+					 */
+					while ((node->bs_next_range < node->bs_nranges) &&
+						   (node->bs_ranges[node->bs_next_range].processed))
+						node->bs_next_range++;
+
+					Assert(node->bs_next_range <= node->bs_nranges);
+
+					/* might point just after the last range */
+					range = &node->bs_ranges[node->bs_next_range];
+
+					/*
+					 * Is this the last regular range? We might have either run
+					 * out of ranges in general, or maybe we just hit the first
+					 * all-null or unprocessed range.
+					 *
+					 * In this case there might still be a bunch of tuples in
+					 * the tuplestore, so we need to process them properly. We
+					 * load them into the tuplesort and process them.
+					 */
+					if ((node->bs_next_range == node->bs_nranges) ||
+						(range->all_nulls || range->not_summarized))
+					{
+						/* still some tuples to process */
+						if (node->bs_tuplestore != NULL)
+						{
+							brinsort_load_spill_tuples(node, false);
+							node->bs_tuplestore = NULL;
+							tuplesort_performsort(node->bs_tuplesortstate);
+
+							node->bs_phase = BRINSORT_PROCESS_RANGE;
+							break;
+						}
+
+						/*
+						 * We've reached the end, and there are no more rows in the
+						 * tuplestore, so we're done.
+						 */
+						if (node->bs_next_range == node->bs_nranges)
+						{
+							elog(DEBUG1, "phase => FINISHED / last range processed");
+							node->bs_phase = (nullsFirst) ? BRINSORT_FINISHED : BRINSORT_LOAD_NULLS;
+							break;
+						}
+					}
+
+					/* Fine, we can process this range, so move the index too. */
+					node->bs_next_range++;
+
+					/*
+					 * Load the next unprocessed range. We update the watermark,
+					 * so that we don't need to check it when loading tuples.
+					 */
+					brinsort_start_tidscan(node, range, true, true);
+					brinsort_load_tuples(node, false, false);
+					brinsort_end_tidscan(node);
+
+					Assert(range->processed);
+
+					/* Load matching tuples from the current spill tuplestore. */
+					brinsort_load_spill_tuples(node, true);
+
+					/*
+					 * Load tuples from intersecting ranges.
+					 *
+					 * XXX We do this after processing the spill tuplestore,
+					 * because we will add rows to it - but we know those rows
+					 * should be there, and brinsort_load_spill would recheck
+					 * them again unnecessarily.
+					 */
+					elog(DEBUG1, "loading intersecting ranges");
+					brinsort_load_intersecting_ranges(plan, node);
+
+					/*
+					 * If this is the first range, process unsummarized ranges
+					 * too. Similarly to the intersecting ranges, we do this
+					 * after loading tuples from the spill tuplestore, because
+					 * we might write some (many) tuples into that.
+					 */
+					if (node->bs_next_range == 1)
+						brinsort_load_unsummarized_ranges(plan, node);
+
+					elog(DEBUG1, "performing sort");
+					tuplesort_performsort(node->bs_tuplesortstate);
+
+					node->bs_phase = BRINSORT_PROCESS_RANGE;
+					break;
+				}
+
+			case BRINSORT_PROCESS_RANGE:
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplesortstate != NULL)
+				{
+					if (tuplesort_gettupleslot(node->bs_tuplesortstate,
+										ScanDirectionIsForward(direction),
+										false, slot, NULL))
+						return slot;
+
+					/* once we're done with the tuplesort, reset it */
+					tuplesort_reset(node->bs_tuplesortstate);
+					node->bs_phase = BRINSORT_LOAD_RANGE;	/* load next range */
+				}
+
+				break;
+
+			case BRINSORT_LOAD_NULLS:
+				{
+					BrinSortRange *range;
+
+					elog(DEBUG1, "phase = LOAD_NULLS");
+
+					/*
+					 * Ignore ranges that can't possibly have NULL values. We do
+					 * not care about whether the range was already processed.
+					 */
+					while (node->bs_next_range_nulls < node->bs_nranges)
+					{
+						/* these ranges may have NULL values */
+						if (node->bs_ranges[node->bs_next_range_nulls].has_nulls ||
+							node->bs_ranges[node->bs_next_range_nulls].all_nulls ||
+							node->bs_ranges[node->bs_next_range_nulls].not_summarized)
+							break;
+
+						node->bs_next_range_nulls++;
+					}
+
+					Assert(node->bs_next_range_nulls <= node->bs_nranges);
+
+					/*
+					 * Did we process the last range? There should be nothing left
+					 * in the tuplestore, because we flush that at the end of
+					 * processing regular tuples.
+					 */
+					if (node->bs_next_range_nulls == node->bs_nranges)
+					{
+						elog(DEBUG1, "phase => FINISHED / last range processed");
+						Assert(node->bs_tuplestore == NULL);
+						node->bs_phase = BRINSORT_FINISHED;
+						node->bs_phase = (nullsFirst) ? BRINSORT_LOAD_RANGE : BRINSORT_FINISHED;
+						break;
+					}
+
+					range = &node->bs_ranges[node->bs_next_range_nulls];
+					node->bs_next_range_nulls++;
+
+					/*
+					 * Load the next unprocessed range. We update the watermark,
+					 * so that we don't need to check it when loading tuples.
+					 */
+					brinsort_start_tidscan(node, range, false, false);
+					brinsort_load_tuples(node, true, true);
+					brinsort_end_tidscan(node);
+
+					node->bs_phase = BRINSORT_PROCESS_NULLS;
+					break;
+				}
+
+				break;
+
+			case BRINSORT_PROCESS_NULLS:
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				Assert(node->bs_tuplestore != NULL);
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplestore != NULL)
+				{
+
+					while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+						return slot;
+
+					tuplestore_end(node->bs_tuplestore);
+					node->bs_tuplestore = NULL;
+
+					node->bs_phase = BRINSORT_LOAD_NULLS;	/* load next range */
+				}
+
+				break;
+
+			case BRINSORT_FINISHED:
+				elog(ERROR, "unexpected BrinSort phase: FINISHED");
+				break;
+		}
+	}
+
+	/*
+	 * if we get here it means the index scan failed so we are at the end of
+	 * the scan..
+	 */
+	node->iss_ReachedEnd = true;
+	return ExecClearTuple(slot);
+}
+
+/*
+ * IndexRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+IndexRecheck(BrinSortState *node, TupleTableSlot *slot)
+{
+	ExprContext *econtext;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	econtext = node->ss.ps.ps_ExprContext;
+
+	/* Does the tuple meet the indexqual condition? */
+	econtext->ecxt_scantuple = slot;
+	return ExecQualAndReset(node->indexqualorig, econtext);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSort(node)
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecBrinSort(PlanState *pstate)
+{
+	BrinSortState *node = castNode(BrinSortState, pstate);
+
+	/*
+	 * If we have runtime keys and they've not already been set up, do it now.
+	 */
+	if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady)
+		ExecReScan((PlanState *) node);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) IndexNext,
+					(ExecScanRecheckMtd) IndexRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanBrinSort(node)
+ *
+ *		Recalculates the values of any scan keys whose value depends on
+ *		information known at runtime, then rescans the indexed relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanBrinSort(BrinSortState *node)
+{
+	/*
+	 * If we are doing runtime key calculations (ie, any of the index key
+	 * values weren't simple Consts), compute the new key values.  But first,
+	 * reset the context so we don't leak memory as each outer tuple is
+	 * scanned.  Note this assumes that we will recalculate *all* runtime keys
+	 * on each call.
+	 */
+	if (node->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *econtext = node->iss_RuntimeContext;
+
+		ResetExprContext(econtext);
+		ExecIndexEvalRuntimeKeys(econtext,
+								 node->iss_RuntimeKeys,
+								 node->iss_NumRuntimeKeys);
+	}
+	node->iss_RuntimeKeysReady = true;
+
+	/* reset index scan */
+	if (node->iss_ScanDesc)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+	node->iss_ReachedEnd = false;
+
+	ExecScanReScan(&node->ss);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecEndBrinSort
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndBrinSort(BrinSortState *node)
+{
+	Relation	indexRelationDesc;
+	IndexScanDesc IndexScanDesc;
+
+	/*
+	 * extract information from the node
+	 */
+	indexRelationDesc = node->iss_RelationDesc;
+	IndexScanDesc = node->iss_ScanDesc;
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/*
+	 * close the index relation (no-op if we didn't open it)
+	 */
+	if (IndexScanDesc)
+		index_endscan(IndexScanDesc);
+	if (indexRelationDesc)
+		index_close(indexRelationDesc, NoLock);
+
+	if (node->ss.ss_currentScanDesc != NULL)
+		table_endscan(node->ss.ss_currentScanDesc);
+
+	if (node->bs_tuplestore != NULL)
+		tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = NULL;
+
+	if (node->bs_tuplesortstate != NULL)
+		tuplesort_end(node->bs_tuplesortstate);
+	node->bs_tuplesortstate = NULL;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortMarkPos
+ *
+ * Note: we assume that no caller attempts to set a mark before having read
+ * at least one tuple.  Otherwise, iss_ScanDesc might still be NULL.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortMarkPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  If a test tuple exists for
+		 * this relation, then we shouldn't access the index at all.  We would
+		 * instead need to save, and later restore, the state of the
+		 * relsubs_done flag, so that re-fetching the test tuple is possible.
+		 * However, given the assumption that no caller sets a mark at the
+		 * start of the scan, we can only get here with relsubs_done[i]
+		 * already set, and so no state need be saved.
+		 */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortMarkPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_markpos(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecIndexRestrPos
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortRestrPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (estate->es_epq_active != NULL)
+	{
+		/* See comments in ExecIndexMarkPos */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortRestrPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_restrpos(node->iss_ScanDesc);
+}
+
+
+/*
+ * We always sort the ranges so that we have them in this general order
+ *
+ * 1) ranges sorted by min/max value, as dictated by ASC/DESC
+ * 2) all-null ranges
+ * 3) not-summarized ranges
+ *
+ */
+static int
+brin_sort_range_asc_cmp(const void *a, const void *b, void *arg)
+{
+	int				r;
+	BrinSortRange  *ra = (BrinSortRange *) a;
+	BrinSortRange  *rb = (BrinSortRange *) b;
+	SortSupport		ssup = (SortSupport) arg;
+
+	/* unsummarized ranges are sorted last */
+	if (ra->not_summarized && rb->not_summarized)
+		return 0;
+	else if (ra->not_summarized)
+		return -1;
+	else if (rb->not_summarized)
+		return 1;
+
+	Assert(!(ra->not_summarized || rb->not_summarized));
+
+	/* then we sort all-null ranges */
+	if (ra->all_nulls && rb->all_nulls)
+		return 0;
+	else if (ra->all_nulls)
+		return -1;
+	else if (rb->all_nulls)
+		return 1;
+
+	Assert(!(ra->all_nulls || rb->all_nulls));
+
+	r = ApplySortComparator(ra->max_value, false, rb->max_value, false, ssup);
+	if (r != 0)
+		return r;
+
+	return ApplySortComparator(ra->min_value, false, rb->min_value, false, ssup);
+}
+
+static int
+brin_sort_range_desc_cmp(const void *a, const void *b, void *arg)
+{
+	int				r;
+	BrinSortRange  *ra = (BrinSortRange *) a;
+	BrinSortRange  *rb = (BrinSortRange *) b;
+	SortSupport		ssup = (SortSupport) arg;
+
+	/* unsummarized ranges are sorted last */
+	if (ra->not_summarized && rb->not_summarized)
+		return 0;
+	else if (ra->not_summarized)
+		return -1;
+	else if (rb->not_summarized)
+		return 1;
+
+	Assert(!(ra->not_summarized || rb->not_summarized));
+
+	/* then we sort all-null ranges */
+	if (ra->all_nulls && rb->all_nulls)
+		return 0;
+	else if (ra->all_nulls)
+		return -1;
+	else if (rb->all_nulls)
+		return 1;
+
+	Assert(!(ra->all_nulls || rb->all_nulls));
+
+	r = ApplySortComparator(ra->min_value, false, rb->min_value, false, ssup);
+	if (r != 0)
+		return r;
+
+	return ApplySortComparator(ra->max_value, false, rb->max_value, false, ssup);
+}
+
+static int
+brin_sort_rangeptr_asc_cmp(const void *a, const void *b, void *arg)
+{
+	BrinSortRange  *ra = *(BrinSortRange **) a;
+	BrinSortRange  *rb = *(BrinSortRange **) b;
+	SortSupport		ssup = (SortSupport) arg;
+
+	/* unsummarized ranges are sorted last */
+	if (ra->not_summarized && rb->not_summarized)
+		return 0;
+	else if (ra->not_summarized)
+		return -1;
+	else if (rb->not_summarized)
+		return 1;
+
+	Assert(!(ra->not_summarized || rb->not_summarized));
+
+	/* then we sort all-null ranges */
+	if (ra->all_nulls && rb->all_nulls)
+		return 0;
+	else if (ra->all_nulls)
+		return -1;
+	else if (rb->all_nulls)
+		return 1;
+
+	Assert(!(ra->all_nulls || rb->all_nulls));
+
+	return ApplySortComparator(ra->min_value, false, rb->min_value, false, ssup);
+}
+
+static int
+brin_sort_rangeptr_desc_cmp(const void *a, const void *b, void *arg)
+{
+	BrinSortRange  *ra = *(BrinSortRange **) a;
+	BrinSortRange  *rb = *(BrinSortRange **) b;
+	SortSupport		ssup = (SortSupport) arg;
+
+	/* unsummarized ranges are sorted last */
+	if (ra->not_summarized && rb->not_summarized)
+		return 0;
+	else if (ra->not_summarized)
+		return -1;
+	else if (rb->not_summarized)
+		return 1;
+
+	Assert(!(ra->not_summarized || rb->not_summarized));
+
+	/* then we sort all-null ranges */
+	if (ra->all_nulls && rb->all_nulls)
+		return 0;
+	else if (ra->all_nulls)
+		return -1;
+	else if (rb->all_nulls)
+		return 1;
+
+	Assert(!(ra->all_nulls || rb->all_nulls));
+
+	return ApplySortComparator(ra->max_value, false, rb->max_value, false, ssup);
+}
+
+/*
+ * somewhat crippled verson of bringetbitmap
+ *
+ * XXX We don't call consistent function (or any other function), so unlike
+ * bringetbitmap we don't set a separate memory context. If we end up filtering
+ * the ranges somehow (e.g. by WHERE conditions), this might be necessary.
+ *
+ * XXX Should be part of opclass, to somewhere in brin_minmax.c etc.
+ */
+static void
+ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
+{
+	IndexScanDesc	scan = planstate->iss_ScanDesc;
+	Relation	indexRel = planstate->iss_RelationDesc;
+	int			attno;
+	FmgrInfo   *rangeproc;
+	BrinRanges *ranges;
+
+	/* BRIN Sort only allows ORDER BY using a single column */
+	Assert(node->numCols == 1);
+
+	/*
+	 * Determine index attnum we're interested in. The sortColIdx has attnums
+	 * from the table, but we need index attnum so that we can fetch the right
+	 * range summary.
+	 *
+	 * XXX Maybe we could/should arrange the tlists differently, so that this
+	 * is not necessary?
+	 */
+	attno = 0;
+	for (int i = 0; i < indexRel->rd_index->indnatts; i++)
+	{
+		if (indexRel->rd_index->indkey.values[i] == node->sortColIdx[0])
+		{
+			attno = (i + 1);
+			break;
+		}
+	}
+
+	/* get procedure to generate sort ranges */
+	rangeproc = index_getprocinfo(indexRel, attno, BRIN_PROCNUM_RANGES);
+
+	/*
+	 * Should not get here without a proc, thanks to the check before
+	 * building the BrinSort path.
+	 */
+	Assert(OidIsValid(rangeproc));
+
+	/* XXX maybe call this in a separate memory context? */
+	ranges = (BrinRanges *) DatumGetPointer(FunctionCall2Coll(rangeproc,
+											InvalidOid,	/* FIXME use proper collation*/
+											PointerGetDatum(scan),
+											Int16GetDatum(attno)));
+
+	/* allocate for space, and also for the alternative ordering */
+	planstate->bs_nranges = 0;
+	planstate->bs_ranges = (BrinSortRange *) palloc0(ranges->nranges * sizeof(BrinSortRange));
+	planstate->bs_ranges_minval = (BrinSortRange **) palloc0(ranges->nranges * sizeof(BrinSortRange *));
+
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		planstate->bs_ranges[i].blkno_start = ranges->ranges[i].blkno_start;
+		planstate->bs_ranges[i].blkno_end = ranges->ranges[i].blkno_end;
+		planstate->bs_ranges[i].min_value = ranges->ranges[i].min_value;
+		planstate->bs_ranges[i].max_value = ranges->ranges[i].max_value;
+		planstate->bs_ranges[i].has_nulls = ranges->ranges[i].has_nulls;
+		planstate->bs_ranges[i].all_nulls = ranges->ranges[i].all_nulls;
+		planstate->bs_ranges[i].not_summarized = ranges->ranges[i].not_summarized;
+
+		planstate->bs_ranges_minval[i] = &planstate->bs_ranges[i];
+	}
+
+	planstate->bs_nranges = ranges->nranges;
+
+	/*
+	 * Sort ranges by maximum value, as determined by the sort operator.
+	 *
+	 * This automatically considers the ASC/DESC, because for DESC we use
+	 * an operator that deems the "min_value" value greater.
+	 *
+	 * XXX Not sure what to do about NULLS FIRST / LAST.
+	 */
+	memset(&planstate->bs_sortsupport, 0, sizeof(SortSupportData));
+	PrepareSortSupportFromOrderingOp(node->sortOperators[0], &planstate->bs_sortsupport);
+
+	/*
+	 * We need to sort by max_value in the first step, so that we can add
+	 * ranges incrementally, as they add "minimum" number of rows.
+	 *
+	 * But then in the second step we need to add all intersecting ranges X
+	 * until X.min_value > A.max_value (where A is the range added in first
+	 * step). And for that we probably need a separate sort by min_value,
+	 * perhaps of just a pointer array, pointing back to bs_ranges.
+	 *
+	 * For DESC sort this works the opposite way, i.e. first step sort by
+	 * min_value, then max_value.
+	 */
+	if (ScanDirectionIsForward(node->indexorderdir))
+	{
+		qsort_arg(planstate->bs_ranges, planstate->bs_nranges, sizeof(BrinSortRange),
+				  brin_sort_range_asc_cmp, &planstate->bs_sortsupport);
+
+		qsort_arg(planstate->bs_ranges_minval, planstate->bs_nranges, sizeof(BrinSortRange *),
+				  brin_sort_rangeptr_asc_cmp, &planstate->bs_sortsupport);
+	}
+	else
+	{
+		qsort_arg(planstate->bs_ranges, planstate->bs_nranges, sizeof(BrinSortRange),
+				  brin_sort_range_desc_cmp, &planstate->bs_sortsupport);
+
+		qsort_arg(planstate->bs_ranges_minval, planstate->bs_nranges, sizeof(BrinSortRange *),
+				  brin_sort_rangeptr_desc_cmp, &planstate->bs_sortsupport);
+	}
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitBrinSort
+ *
+ *		Initializes the index scan's state information, creates
+ *		scan keys, and opens the base and index relations.
+ *
+ *		Note: index scans have 2 sets of state information because
+ *			  we have to keep track of the base relation and the
+ *			  index relation.
+ * ----------------------------------------------------------------
+ */
+BrinSortState *
+ExecInitBrinSort(BrinSort *node, EState *estate, int eflags)
+{
+	BrinSortState *indexstate;
+	Relation	currentRelation;
+	LOCKMODE	lockmode;
+
+	/*
+	 * create state structure
+	 */
+	indexstate = makeNode(BrinSortState);
+	indexstate->ss.ps.plan = (Plan *) node;
+	indexstate->ss.ps.state = estate;
+	indexstate->ss.ps.ExecProcNode = ExecBrinSort;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &indexstate->ss.ps);
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	indexstate->ss.ss_currentRelation = currentRelation;
+	indexstate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &indexstate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTypeTL(&indexstate->ss.ps);
+	ExecAssignScanProjectionInfo(&indexstate->ss);
+
+	/*
+	 * initialize child expressions
+	 *
+	 * Note: we don't initialize all of the indexqual expression, only the
+	 * sub-parts corresponding to runtime keys (see below).  Likewise for
+	 * indexorderby, if any.  But the indexqualorig expression is always
+	 * initialized even though it will only be used in some uncommon cases ---
+	 * would be nice to improve that.  (Problem is that any SubPlans present
+	 * in the expression must be found now...)
+	 */
+	indexstate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) indexstate);
+	indexstate->indexqualorig =
+		ExecInitQual(node->indexqualorig, (PlanState *) indexstate);
+
+	/*
+	 * If we are just doing EXPLAIN (ie, aren't going to run the plan), stop
+	 * here.  This allows an index-advisor plugin to EXPLAIN a plan containing
+	 * references to nonexistent indexes.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return indexstate;
+
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+
+	/*
+	 * Initialize index-specific scan state
+	 */
+	indexstate->iss_RuntimeKeysReady = false;
+	indexstate->iss_RuntimeKeys = NULL;
+	indexstate->iss_NumRuntimeKeys = 0;
+
+	/*
+	 * build the index scan keys from the index qualification
+	 */
+	ExecIndexBuildScanKeys((PlanState *) indexstate,
+						   indexstate->iss_RelationDesc,
+						   node->indexqual,
+						   false,
+						   &indexstate->iss_ScanKeys,
+						   &indexstate->iss_NumScanKeys,
+						   &indexstate->iss_RuntimeKeys,
+						   &indexstate->iss_NumRuntimeKeys,
+						   NULL,	/* no ArrayKeys */
+						   NULL);
+
+	/*
+	 * If we have runtime keys, we need an ExprContext to evaluate them. The
+	 * node's standard context won't do because we want to reset that context
+	 * for every tuple.  So, build another context just like the other one...
+	 * -tgl 7/11/00
+	 */
+	if (indexstate->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *stdecontext = indexstate->ss.ps.ps_ExprContext;
+
+		ExecAssignExprContext(estate, &indexstate->ss.ps);
+		indexstate->iss_RuntimeContext = indexstate->ss.ps.ps_ExprContext;
+		indexstate->ss.ps.ps_ExprContext = stdecontext;
+	}
+	else
+	{
+		indexstate->iss_RuntimeContext = NULL;
+	}
+
+	indexstate->bs_tuplesortstate = NULL;
+	indexstate->bs_qual = indexstate->ss.ps.qual;
+	indexstate->ss.ps.qual = NULL;
+	ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+
+	/*
+	 * all done.
+	 */
+	return indexstate;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortEstimate(BrinSortState *node,
+					  ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+
+	node->iss_PscanLen = index_parallelscan_estimate(node->iss_RelationDesc,
+													 estate->es_snapshot);
+	shm_toc_estimate_chunk(&pcxt->estimator, node->iss_PscanLen);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeDSM
+ *
+ *		Set up a parallel index scan descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeDSM(BrinSortState *node,
+						   ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_allocate(pcxt->toc, node->iss_PscanLen);
+	index_parallelscan_initialize(node->ss.ss_currentRelation,
+								  node->iss_RelationDesc,
+								  estate->es_snapshot,
+								  piscan);
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, piscan);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortReInitializeDSM(BrinSortState *node,
+							 ParallelContext *pcxt)
+{
+	index_parallelrescan(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeWorker(BrinSortState *node,
+							  ParallelWorkerContext *pwcxt)
+{
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 4c6b1d1f55b..64d103b19e9 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -790,6 +790,260 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	path->path.total_cost = startup_cost + run_cost;
 }
 
+void
+cost_brinsort(BrinSortPath *path, PlannerInfo *root, double loop_count,
+		   bool partial_path)
+{
+	IndexOptInfo *index = path->ipath.indexinfo;
+	RelOptInfo *baserel = index->rel;
+	amcostestimate_function amcostestimate;
+	List	   *qpquals;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_run_cost = 0;
+	Cost		indexStartupCost;
+	Cost		indexTotalCost;
+	Selectivity indexSelectivity;
+	double		indexCorrelation,
+				csquared;
+	double		spc_seq_page_cost,
+				spc_random_page_cost;
+	Cost		min_IO_cost,
+				max_IO_cost;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	double		tuples_fetched;
+	double		pages_fetched;
+	double		rand_heap_pages;
+	double		index_pages;
+
+	/* Should only be applied to base relations */
+	Assert(IsA(baserel, RelOptInfo) &&
+		   IsA(index, IndexOptInfo));
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/*
+	 * Mark the path with the correct row estimate, and identify which quals
+	 * will need to be enforced as qpquals.  We need not check any quals that
+	 * are implied by the index's predicate, so we can use indrestrictinfo not
+	 * baserestrictinfo as the list of relevant restriction clauses for the
+	 * rel.
+	 */
+	if (path->ipath.path.param_info)
+	{
+		path->ipath.path.rows = path->ipath.path.param_info->ppi_rows;
+		/* qpquals come from the rel's restriction clauses and ppi_clauses */
+		qpquals = list_concat(extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+														  path->ipath.indexclauses),
+							  extract_nonindex_conditions(path->ipath.path.param_info->ppi_clauses,
+														  path->ipath.indexclauses));
+	}
+	else
+	{
+		path->ipath.path.rows = baserel->rows;
+		/* qpquals come from just the rel's restriction clauses */
+		qpquals = extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+											  path->ipath.indexclauses);
+	}
+
+	if (!enable_indexscan)
+		startup_cost += disable_cost;
+	/* we don't need to check enable_indexonlyscan; indxpath.c does that */
+
+	/*
+	 * Call index-access-method-specific code to estimate the processing cost
+	 * for scanning the index, as well as the selectivity of the index (ie,
+	 * the fraction of main-table tuples we will have to retrieve) and its
+	 * correlation to the main-table tuple order.  We need a cast here because
+	 * pathnodes.h uses a weak function type to avoid including amapi.h.
+	 */
+	amcostestimate = (amcostestimate_function) index->amcostestimate;
+	amcostestimate(root, &path->ipath, loop_count,
+				   &indexStartupCost, &indexTotalCost,
+				   &indexSelectivity, &indexCorrelation,
+				   &index_pages);
+
+	/*
+	 * Save amcostestimate's results for possible use in bitmap scan planning.
+	 * We don't bother to save indexStartupCost or indexCorrelation, because a
+	 * bitmap scan doesn't care about either.
+	 */
+	path->ipath.indextotalcost = indexTotalCost;
+	path->ipath.indexselectivity = indexSelectivity;
+
+	/* all costs for touching index itself included here */
+	startup_cost += indexStartupCost;
+	run_cost += indexTotalCost - indexStartupCost;
+
+	/* estimate number of main-table tuples fetched */
+	tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);
+
+	/* fetch estimated page costs for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/*----------
+	 * Estimate number of main-table pages fetched, and compute I/O cost.
+	 *
+	 * When the index ordering is uncorrelated with the table ordering,
+	 * we use an approximation proposed by Mackert and Lohman (see
+	 * index_pages_fetched() for details) to compute the number of pages
+	 * fetched, and then charge spc_random_page_cost per page fetched.
+	 *
+	 * When the index ordering is exactly correlated with the table ordering
+	 * (just after a CLUSTER, for example), the number of pages fetched should
+	 * be exactly selectivity * table_size.  What's more, all but the first
+	 * will be sequential fetches, not the random fetches that occur in the
+	 * uncorrelated case.  So if the number of pages is more than 1, we
+	 * ought to charge
+	 *		spc_random_page_cost + (pages_fetched - 1) * spc_seq_page_cost
+	 * For partially-correlated indexes, we ought to charge somewhere between
+	 * these two estimates.  We currently interpolate linearly between the
+	 * estimates based on the correlation squared (XXX is that appropriate?).
+	 *
+	 * If it's an index-only scan, then we will not need to fetch any heap
+	 * pages for which the visibility map shows all tuples are visible.
+	 * Hence, reduce the estimated number of heap fetches accordingly.
+	 * We use the measured fraction of the entire heap that is all-visible,
+	 * which might not be particularly relevant to the subset of the heap
+	 * that this query will fetch; but it's not clear how to do better.
+	 *----------
+	 */
+	if (loop_count > 1)
+	{
+		/*
+		 * For repeated indexscans, the appropriate estimate for the
+		 * uncorrelated case is to scale up the number of tuples fetched in
+		 * the Mackert and Lohman formula by the number of scans, so that we
+		 * estimate the number of pages fetched by all the scans; then
+		 * pro-rate the costs for one scan.  In this case we assume all the
+		 * fetches are random accesses.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+
+		/*
+		 * In the perfectly correlated case, the number of pages touched by
+		 * each scan is selectivity * table_size, and we can use the Mackert
+		 * and Lohman formula at the page level to estimate how much work is
+		 * saved by caching across scans.  We still assume all the fetches are
+		 * random, though, which is an overestimate that's hard to correct for
+		 * without double-counting the cache effects.  (But in most cases
+		 * where such a plan is actually interesting, only one page would get
+		 * fetched per scan anyway, so it shouldn't matter much.)
+		 */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		pages_fetched = index_pages_fetched(pages_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		min_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+	}
+	else
+	{
+		/*
+		 * Normal case: apply the Mackert and Lohman formula, and then
+		 * interpolate between that and the correlation-derived result.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		/* max_IO_cost is for the perfectly uncorrelated case (csquared=0) */
+		max_IO_cost = pages_fetched * spc_random_page_cost;
+
+		/* min_IO_cost is for the perfectly correlated case (csquared=1) */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		if (pages_fetched > 0)
+		{
+			min_IO_cost = spc_random_page_cost;
+			if (pages_fetched > 1)
+				min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
+		}
+		else
+			min_IO_cost = 0;
+	}
+
+	if (partial_path)
+	{
+		/*
+		 * Estimate the number of parallel workers required to scan index. Use
+		 * the number of heap pages computed considering heap fetches won't be
+		 * sequential as for parallel scans the pages are accessed in random
+		 * order.
+		 */
+		path->ipath.path.parallel_workers = compute_parallel_worker(baserel,
+															  rand_heap_pages,
+															  index_pages,
+															  max_parallel_workers_per_gather);
+
+		/*
+		 * Fall out if workers can't be assigned for parallel scan, because in
+		 * such a case this path will be rejected.  So there is no benefit in
+		 * doing extra computation.
+		 */
+		if (path->ipath.path.parallel_workers <= 0)
+			return;
+
+		path->ipath.path.parallel_aware = true;
+	}
+
+	/*
+	 * Now interpolate based on estimated index order correlation to get total
+	 * disk I/O cost for main table accesses.
+	 */
+	csquared = indexCorrelation * indexCorrelation;
+
+	run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
+
+	/*
+	 * Estimate CPU costs per tuple.
+	 *
+	 * What we want here is cpu_tuple_cost plus the evaluation costs of any
+	 * qual clauses that we have to evaluate as qpquals.
+	 */
+	cost_qual_eval(&qpqual_cost, qpquals, root);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+
+	cpu_run_cost += cpu_per_tuple * tuples_fetched;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->ipath.path.pathtarget->cost.startup;
+	cpu_run_cost += path->ipath.path.pathtarget->cost.per_tuple * path->ipath.path.rows;
+
+	/* Adjust costing for parallelism, if used. */
+	if (path->ipath.path.parallel_workers > 0)
+	{
+		double		parallel_divisor = get_parallel_divisor(&path->ipath.path);
+
+		path->ipath.path.rows = clamp_row_est(path->ipath.path.rows / parallel_divisor);
+
+		/* The CPU cost is divided among all the workers. */
+		cpu_run_cost /= parallel_divisor;
+	}
+
+	run_cost += cpu_run_cost;
+
+	path->ipath.path.startup_cost = startup_cost;
+	path->ipath.path.total_cost = startup_cost + run_cost;
+}
+
 /*
  * extract_nonindex_conditions
  *
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c31fcc917df..6ba4347dbdc 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -17,12 +17,16 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
+#include "access/relation.h"
 #include "access/stratnum.h"
 #include "access/sysattr.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
@@ -32,10 +36,13 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/rel.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
 
+bool		enable_brinsort = true;
+
 /* XXX see PartCollMatchesExprColl */
 #define IndexCollMatchesExprColl(idxcollation, exprcollation) \
 	((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
@@ -1127,6 +1134,196 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	/*
+	 * If this is a BRIN index with suitable opclass (minmax or such), we may
+	 * try doing BRIN sort. BRIN indexes are not ordered and amcanorderbyop
+	 * is set to false, so we probably will need some new opclass flag to
+	 * mark indexes that support this.
+	 */
+	if (enable_brinsort && pathkeys_possibly_useful)
+	{
+		ListCell *lc;
+		Relation rel2 = relation_open(index->indexoid, NoLock);
+		int		 idx;
+
+		/*
+		 * Try generating sorted paths for each key with the right opclass.
+		 */
+		idx = -1;
+		foreach(lc, index->indextlist)
+		{
+			TargetEntry	   *indextle = (TargetEntry *) lfirst(lc);
+			BrinSortPath   *bpath;
+			Oid				rangeproc;
+			AttrNumber		attnum;
+
+			idx++;
+			attnum = (idx + 1);
+
+			/* skip expressions for now */
+			if (!AttributeNumberIsValid(index->indexkeys[idx]))
+				continue;
+
+			/* XXX ignore non-BRIN indexes */
+			if (rel2->rd_rel->relam != BRIN_AM_OID)
+				continue;
+
+			/*
+			 * XXX Ignore keys not using an opclass with the "ranges" proc.
+			 * For now we only do this for some minmax opclasses, but adding
+			 * it to all minmax is simple, and adding it to minmax-multi
+			 * should not be very hard.
+			 */
+			rangeproc = index_getprocid(rel2, attnum, BRIN_PROCNUM_RANGES);
+			if (!OidIsValid(rangeproc))
+				continue;
+
+			orderbyclauses = NIL;
+			orderbyclausecols = NIL;
+
+			/*
+			 * XXX stuff extracted from build_index_pathkeys, except that we
+			 * only deal with a single index key (producing a single pathkey),
+			 * so we only sort on a single column. I guess we could use more
+			 * index keys and sort on more expressions? Would that mean these
+			 * keys need to be rather well correlated? In any case, it seems
+			 * rather complex to implement, so I leave it as a possible
+			 * future improvement.
+			 *
+			 * XXX This could also use the other BRIN keys (even from other
+			 * indexes) in a different way - we might use the other ranges
+			 * to quickly eliminate some of the chunks, essentially like a
+			 * bitmap, but maybe without using the bitmap. Or we might use
+			 * other indexes through bitmaps.
+			 *
+			 * XXX This fakes a number of parameters, because we don't store
+			 * the btree opclass in the index, instead we use the default
+			 * one for the key data type. And BRIN does not allow specifying
+			 *
+			 * XXX We don't add the path to result, because this function is
+			 * supposed to generate IndexPaths. Instead, we just add the path
+			 * using add_path(). We should be building this in a different
+			 * place, perhaps in create_index_paths() or so.
+			 *
+			 * XXX By building it elsewhere, we could also leverage the index
+			 * paths we've built here, particularly the bitmap index paths,
+			 * which we could use to eliminate many of the ranges.
+			 *
+			 * XXX We don't have any explicit ordering associated with the
+			 * BRIN index, e.g. we don't have ASC/DESC and NULLS FIRST/LAST.
+			 * So this is not encoded in the index, and we can satisfy all
+			 * these cases - but we need to add paths for each combination.
+			 * I wonder if there's a better way to do this.
+			 */
+
+			/* ASC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 orderbyclauses,
+											 orderbyclausecols,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 orderbyclauses,
+											 orderbyclausecols,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* ASC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 orderbyclauses,
+											 orderbyclausecols,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 orderbyclauses,
+											 orderbyclausecols,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+		}
+
+		relation_close(rel2, NoLock);
+	}
+
 	return result;
 }
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index a9943cd6e01..83dde6f22eb 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -27,6 +27,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/lsyscache.h"
+#include "utils/typcache.h"
 
 
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
@@ -630,6 +631,55 @@ build_index_pathkeys(PlannerInfo *root,
 	return retval;
 }
 
+
+List *
+build_index_pathkeys_brin(PlannerInfo *root,
+						  IndexOptInfo *index,
+						  TargetEntry  *tle,
+						  int idx,
+						  bool reverse_sort,
+						  bool nulls_first)
+{
+	TypeCacheEntry *typcache;
+	PathKey		   *cpathkey;
+	Oid				sortopfamily;
+
+	/*
+	 * Get default btree opfamily for the type, extracted from the
+	 * entry in index targetlist.
+	 *
+	 * XXX Is there a better / more correct way to do this?
+	 */
+	typcache = lookup_type_cache(exprType((Node *) tle->expr),
+								 TYPECACHE_BTREE_OPFAMILY);
+	sortopfamily = typcache->btree_opf;
+
+	/*
+	 * OK, try to make a canonical pathkey for this sort key.  Note we're
+	 * underneath any outer joins, so nullable_relids should be NULL.
+	 */
+	cpathkey = make_pathkey_from_sortinfo(root,
+										  tle->expr,
+										  NULL,
+										  sortopfamily,
+										  index->opcintype[idx],
+										  index->indexcollations[idx],
+										  reverse_sort,
+										  nulls_first,
+										  0,
+										  index->rel->relids,
+										  false);
+
+	/*
+	 * There may be no pathkey if we haven't matched any sortkey, in which
+	 * case ignore it.
+	 */
+	if (!cpathkey)
+		return NIL;
+
+	return list_make1(cpathkey);
+}
+
 /*
  * partkey_is_bool_constant_for_query
  *
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ab4d8e201df..63ffdf9a6ab 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -124,6 +124,8 @@ static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
 										  List *tlist, List *scan_clauses);
 static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
 								   List *tlist, List *scan_clauses, bool indexonly);
+static BrinSort *create_brinsort_plan(PlannerInfo *root, BrinSortPath *best_path,
+									  List *tlist, List *scan_clauses);
 static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
 											   BitmapHeapPath *best_path,
 											   List *tlist, List *scan_clauses);
@@ -191,6 +193,9 @@ static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 List *indexorderby,
 										 List *indextlist,
 										 ScanDirection indexscandir);
+static BrinSort *make_brinsort(List *qptlist, List *qpqual, Index scanrelid,
+							   Oid indexid, List *indexqual, List *indexqualorig,
+							   ScanDirection indexscandir);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -410,6 +415,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_CustomScan:
 			plan = create_scan_plan(root, best_path, flags);
 			break;
+		case T_BrinSort:
+			plan = create_scan_plan(root, best_path, flags);
+			break;
 		case T_HashJoin:
 		case T_MergeJoin:
 		case T_NestLoop:
@@ -776,6 +784,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												   scan_clauses);
 			break;
 
+		case T_BrinSort:
+			plan = (Plan *) create_brinsort_plan(root,
+												 (BrinSortPath *) best_path,
+												 tlist,
+												 scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -3180,6 +3195,155 @@ create_indexscan_plan(PlannerInfo *root,
 	return scan_plan;
 }
 
+/*
+ * create_indexscan_plan
+ *	  Returns an indexscan plan for the base relation scanned by 'best_path'
+ *	  with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ *
+ * We use this for both plain IndexScans and IndexOnlyScans, because the
+ * qual preprocessing work is the same for both.  Note that the caller tells
+ * us which to build --- we don't look at best_path->path.pathtype, because
+ * create_bitmap_subplan needs to be able to override the prior decision.
+ */
+static BrinSort *
+create_brinsort_plan(PlannerInfo *root,
+					 BrinSortPath *best_path,
+					 List *tlist,
+					 List *scan_clauses)
+{
+	BrinSort   *brinsort_plan;
+	List	   *indexclauses = best_path->ipath.indexclauses;
+	Index		baserelid = best_path->ipath.path.parent->relid;
+	IndexOptInfo *indexinfo = best_path->ipath.indexinfo;
+	Oid			indexoid = indexinfo->indexoid;
+	List	   *qpqual;
+	List	   *stripped_indexquals;
+	List	   *fixed_indexquals;
+	ListCell   *l;
+
+	List	   *pathkeys = best_path->ipath.path.pathkeys;
+
+	/* it should be a base rel... */
+	Assert(baserelid > 0);
+	Assert(best_path->ipath.path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * Extract the index qual expressions (stripped of RestrictInfos) from the
+	 * IndexClauses list, and prepare a copy with index Vars substituted for
+	 * table Vars.  (This step also does replace_nestloop_params on the
+	 * fixed_indexquals.)
+	 */
+	fix_indexqual_references(root, &best_path->ipath,
+							 &stripped_indexquals,
+							 &fixed_indexquals);
+
+	/*
+	 * The qpqual list must contain all restrictions not automatically handled
+	 * by the index, other than pseudoconstant clauses which will be handled
+	 * by a separate gating plan node.  All the predicates in the indexquals
+	 * will be checked (either by the index itself, or by nodeIndexscan.c),
+	 * but if there are any "special" operators involved then they must be
+	 * included in qpqual.  The upshot is that qpqual must contain
+	 * scan_clauses minus whatever appears in indexquals.
+	 *
+	 * is_redundant_with_indexclauses() detects cases where a scan clause is
+	 * present in the indexclauses list or is generated from the same
+	 * EquivalenceClass as some indexclause, and is therefore redundant with
+	 * it, though not equal.  (The latter happens when indxpath.c prefers a
+	 * different derived equality than what generate_join_implied_equalities
+	 * picked for a parameterized scan's ppi_clauses.)  Note that it will not
+	 * match to lossy index clauses, which is critical because we have to
+	 * include the original clause in qpqual in that case.
+	 *
+	 * In some situations (particularly with OR'd index conditions) we may
+	 * have scan_clauses that are not equal to, but are logically implied by,
+	 * the index quals; so we also try a predicate_implied_by() check to see
+	 * if we can discard quals that way.  (predicate_implied_by assumes its
+	 * first input contains only immutable functions, so we have to check
+	 * that.)
+	 *
+	 * Note: if you change this bit of code you should also look at
+	 * extract_nonindex_conditions() in costsize.c.
+	 */
+	qpqual = NIL;
+	foreach(l, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (rinfo->pseudoconstant)
+			continue;			/* we may drop pseudoconstants here */
+		if (is_redundant_with_indexclauses(rinfo, indexclauses))
+			continue;			/* dup or derived from same EquivalenceClass */
+		if (!contain_mutable_functions((Node *) rinfo->clause) &&
+			predicate_implied_by(list_make1(rinfo->clause), stripped_indexquals,
+								 false))
+			continue;			/* provably implied by indexquals */
+		qpqual = lappend(qpqual, rinfo);
+	}
+
+	/* Sort clauses into best execution order */
+	qpqual = order_qual_clauses(root, qpqual);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	qpqual = extract_actual_clauses(qpqual, false);
+
+	/*
+	 * We have to replace any outer-relation variables with nestloop params in
+	 * the indexqualorig, qpqual, and indexorderbyorig expressions.  A bit
+	 * annoying to have to do this separately from the processing in
+	 * fix_indexqual_references --- rethink this when generalizing the inner
+	 * indexscan support.  But note we can't really do this earlier because
+	 * it'd break the comparisons to predicates above ... (or would it?  Those
+	 * wouldn't have outer refs)
+	 */
+	if (best_path->ipath.path.param_info)
+	{
+		stripped_indexquals = (List *)
+			replace_nestloop_params(root, (Node *) stripped_indexquals);
+		qpqual = (List *)
+			replace_nestloop_params(root, (Node *) qpqual);
+	}
+
+	/* Finally ready to build the plan node */
+	brinsort_plan = make_brinsort(tlist,
+								  qpqual,
+								  baserelid,
+								  indexoid,
+								  fixed_indexquals,
+								  stripped_indexquals,
+								  best_path->ipath.indexscandir);
+
+	if (pathkeys != NIL)
+	{
+		/*
+		 * Compute sort column info, and adjust the Append's tlist as needed.
+		 * Because we pass adjust_tlist_in_place = true, we may ignore the
+		 * function result; it must be the same plan node.  However, we then
+		 * need to detect whether any tlist entries were added.
+		 */
+		(void) prepare_sort_from_pathkeys((Plan *) brinsort_plan, pathkeys,
+										  best_path->ipath.path.parent->relids,
+										  NULL,
+										  true,
+										  &brinsort_plan->numCols,
+										  &brinsort_plan->sortColIdx,
+										  &brinsort_plan->sortOperators,
+										  &brinsort_plan->collations,
+										  &brinsort_plan->nullsFirst);
+		//tlist_was_changed = (orig_tlist_length != list_length(plan->plan.targetlist));
+		for (int i = 0; i < brinsort_plan->numCols; i++)
+			elog(DEBUG1, "%d => %d %d %d %d", i,
+				 brinsort_plan->sortColIdx[i],
+				 brinsort_plan->sortOperators[i],
+				 brinsort_plan->collations[i],
+				 brinsort_plan->nullsFirst[i]);
+	}
+
+	copy_generic_path_info(&brinsort_plan->scan.plan, &best_path->ipath.path);
+
+	return brinsort_plan;
+}
+
 /*
  * create_bitmap_scan_plan
  *	  Returns a bitmap scan plan for the base relation scanned by 'best_path'
@@ -5523,6 +5687,31 @@ make_indexscan(List *qptlist,
 	return node;
 }
 
+static BrinSort *
+make_brinsort(List *qptlist,
+			   List *qpqual,
+			   Index scanrelid,
+			   Oid indexid,
+			   List *indexqual,
+			   List *indexqualorig,
+			   ScanDirection indexscandir)
+{
+	BrinSort  *node = makeNode(BrinSort);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->indexid = indexid;
+	node->indexqual = indexqual;
+	node->indexqualorig = indexqualorig;
+	node->indexorderdir = indexscandir;
+
+	return node;
+}
+
 static IndexOnlyScan *
 make_indexonlyscan(List *qptlist,
 				   List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1f..2584a1f032d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -609,6 +609,25 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 				return set_indexonlyscan_references(root, splan, rtoffset);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *splan = (BrinSort *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->indexqual =
+					fix_scan_list(root, splan->indexqual,
+								  rtoffset, 1);
+				splan->indexqualorig =
+					fix_scan_list(root, splan->indexqualorig,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *splan = (BitmapIndexScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 70f61ae7b1c..e8beadb17b5 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1030,6 +1030,65 @@ create_index_path(PlannerInfo *root,
 	return pathnode;
 }
 
+
+/*
+ * create_brinsort_path
+ *	  Creates a path node for sorted brin index scan.
+ *
+ * 'index' is a usable index.
+ * 'indexclauses' is a list of IndexClause nodes representing clauses
+ *			to be enforced as qual conditions in the scan.
+ * 'indexorderbys' is a list of bare expressions (no RestrictInfos)
+ *			to be used as index ordering operators in the scan.
+ * 'indexorderbycols' is an integer list of index column numbers (zero based)
+ *			the ordering operators can be used with.
+ * 'pathkeys' describes the ordering of the path.
+ * 'indexscandir' is ForwardScanDirection or BackwardScanDirection
+ *			for an ordered index, or NoMovementScanDirection for
+ *			an unordered index.
+ * 'indexonly' is true if an index-only scan is wanted.
+ * 'required_outer' is the set of outer relids for a parameterized path.
+ * 'loop_count' is the number of repetitions of the indexscan to factor into
+ *		estimates of caching behavior.
+ * 'partial_path' is true if constructing a parallel index scan path.
+ *
+ * Returns the new path node.
+ */
+BrinSortPath *
+create_brinsort_path(PlannerInfo *root,
+					 IndexOptInfo *index,
+					 List *indexclauses,
+					 List *indexorderbys,
+					 List *indexorderbycols,
+					 List *pathkeys,
+					 ScanDirection indexscandir,
+					 bool indexonly,
+					 Relids required_outer,
+					 double loop_count,
+					 bool partial_path)
+{
+	BrinSortPath  *pathnode = makeNode(BrinSortPath);
+	RelOptInfo *rel = index->rel;
+
+	pathnode->ipath.path.pathtype = T_BrinSort;
+	pathnode->ipath.path.parent = rel;
+	pathnode->ipath.path.pathtarget = rel->reltarget;
+	pathnode->ipath.path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->ipath.path.parallel_aware = false;
+	pathnode->ipath.path.parallel_safe = rel->consider_parallel;
+	pathnode->ipath.path.parallel_workers = 0;
+	pathnode->ipath.path.pathkeys = pathkeys;
+
+	pathnode->ipath.indexinfo = index;
+	pathnode->ipath.indexclauses = indexclauses;
+	pathnode->ipath.indexscandir = indexscandir;
+
+	cost_brinsort(pathnode, root, loop_count, partial_path);
+
+	return pathnode;
+}
+
 /*
  * create_bitmap_heap_path
  *	  Creates a path node for a bitmap scan.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 05ab087934c..6c854e355b0 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -967,6 +967,16 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_brinsort", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of BRIN sort plans."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_brinsort,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index 887fb0a5532..e8ffc4a0456 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,6 +34,26 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
 
 #define BRIN_DEFAULT_PAGES_PER_RANGE	128
 #define BrinGetPagesPerRange(relation) \
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index 25186609272..7027b41d5fb 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -73,6 +73,7 @@ typedef struct BrinDesc
 #define BRIN_PROCNUM_UNION			4
 #define BRIN_MANDATORY_NPROCS		4
 #define BRIN_PROCNUM_OPTIONS 		5	/* optional */
+#define BRIN_PROCNUM_RANGES 		6	/* optional */
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 4cc129bebd8..41e7143b870 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -804,6 +804,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -835,6 +837,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -864,6 +868,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -893,6 +899,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -905,6 +913,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -917,6 +927,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1034,6 +1046,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1062,6 +1076,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1110,6 +1126,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1160,6 +1178,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '6',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1173,6 +1194,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '6',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1261,6 +1285,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '6',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1314,6 +1341,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '6',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1366,6 +1396,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1436,6 +1468,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '6',
+  amproc => 'brin_minmax_ranges' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1467,6 +1502,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1517,6 +1554,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '6',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1530,6 +1570,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '6',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1542,6 +1585,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1668,6 +1713,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '6',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1721,6 +1769,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '6',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1771,6 +1822,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1785,6 +1838,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '6',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1799,6 +1855,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '6',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1851,6 +1910,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '6', amproc => 'brin_minmax_ranges' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1924,6 +1985,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '6',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_opclass.dat b/src/include/catalog/pg_opclass.dat
index dbcae7ffdd2..52fdfa8cc0c 100644
--- a/src/include/catalog/pg_opclass.dat
+++ b/src/include/catalog/pg_opclass.dat
@@ -301,7 +301,7 @@
   opckeytype => 'int2' },
 { opcmethod => 'brin', opcname => 'int4_minmax_ops',
   opcfamily => 'brin/integer_minmax_ops', opcintype => 'int4',
-  opckeytype => 'int4' },
+  opckeytype => 'int4', oid_symbol => 'INT4_BRIN_MINMAX_OPS_OID'},
 { opcmethod => 'brin', opcname => 'int4_minmax_multi_ops',
   opcfamily => 'brin/integer_minmax_multi_ops', opcintype => 'int4',
   opcdefault => 'f', opckeytype => 'int4' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 62a5b8e655d..9fea2a8387c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8407,6 +8407,9 @@
 { oid => '3386', descr => 'BRIN minmax support',
   proname => 'brin_minmax_union', prorettype => 'bool',
   proargtypes => 'internal internal internal', prosrc => 'brin_minmax_union' },
+{ oid => '9976', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_ranges', prorettype => 'bool',
+  proargtypes => 'internal int2', prosrc => 'brin_minmax_ranges' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/executor/nodeBrinSort.h b/src/include/executor/nodeBrinSort.h
new file mode 100644
index 00000000000..2c860d926ea
--- /dev/null
+++ b/src/include/executor/nodeBrinSort.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeBrinSort.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODEBrinSort_H
+#define NODEBrinSort_H
+
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "nodes/execnodes.h"
+
+extern BrinSortState *ExecInitBrinSort(BrinSort *node, EState *estate, int eflags);
+extern void ExecEndBrinSort(BrinSortState *node);
+extern void ExecBrinSortMarkPos(BrinSortState *node);
+extern void ExecBrinSortRestrPos(BrinSortState *node);
+extern void ExecReScanBrinSort(BrinSortState *node);
+extern void ExecBrinSortEstimate(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortReInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeWorker(BrinSortState *node,
+										  ParallelWorkerContext *pwcxt);
+
+/*
+ * These routines are exported to share code with nodeIndexonlyscan.c and
+ * nodeBitmapBrinSort.c
+ */
+extern void ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
+								   List *quals, bool isorderby,
+								   ScanKey *scanKeys, int *numScanKeys,
+								   IndexRuntimeKeyInfo **runtimeKeys, int *numRuntimeKeys,
+								   IndexArrayKeyInfo **arrayKeys, int *numArrayKeys);
+extern void ExecIndexEvalRuntimeKeys(ExprContext *econtext,
+									 IndexRuntimeKeyInfo *runtimeKeys, int numRuntimeKeys);
+extern bool ExecIndexEvalArrayKeys(ExprContext *econtext,
+								   IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+extern bool ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+
+#endif							/* NODEBrinSort_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc09..74fb0467d7f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1549,6 +1549,75 @@ typedef struct IndexScanState
 	Size		iss_PscanLen;
 } IndexScanState;
 
+typedef struct BrinSortRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	bool	processed;
+} BrinSortRange;
+
+typedef enum {
+	BRINSORT_START,
+	BRINSORT_LOAD_RANGE,
+	BRINSORT_PROCESS_RANGE,
+	BRINSORT_LOAD_NULLS,
+	BRINSORT_PROCESS_NULLS,
+	BRINSORT_FINISHED
+} BrinSortPhase;
+
+typedef struct BrinSortState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	ExprState  *indexqualorig;
+	List	   *indexorderbyorig;
+	struct ScanKeyData *iss_ScanKeys;
+	int			iss_NumScanKeys;
+	struct ScanKeyData *iss_OrderByKeys;
+	int			iss_NumOrderByKeys;
+	IndexRuntimeKeyInfo *iss_RuntimeKeys;
+	int			iss_NumRuntimeKeys;
+	bool		iss_RuntimeKeysReady;
+	ExprContext *iss_RuntimeContext;
+	Relation	iss_RelationDesc;
+	struct IndexScanDescData *iss_ScanDesc;
+
+	/* These are needed for re-checking ORDER BY expr ordering */
+	pairingheap *iss_ReorderQueue;
+	bool		iss_ReachedEnd;
+	Datum	   *iss_OrderByValues;
+	bool	   *iss_OrderByNulls;
+	SortSupport iss_SortSupport;
+	bool	   *iss_OrderByTypByVals;
+	int16	   *iss_OrderByTypLens;
+	Size		iss_PscanLen;
+
+	/* */
+	int				bs_nranges;
+	BrinSortRange  *bs_ranges;
+	BrinSortRange **bs_ranges_minval;
+	int				bs_next_range;
+	int				bs_next_range_intersect;
+	int				bs_next_range_nulls;
+	ExprState	   *bs_qual;
+	Datum			bs_watermark;
+	BrinSortPhase	bs_phase;
+	SortSupportData	bs_sortsupport;
+
+	/*
+	 * We need two tuplesort instances - one for current range, one for
+	 * spill-over tuples from the overlapping ranges
+	 */
+	void		   *bs_tuplesortstate;
+	Tuplestorestate *bs_tuplestore;
+} BrinSortState;
+
 /* ----------------
  *	 IndexOnlyScanState information
  *
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6bda383bead..e79c904a8fc 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1596,6 +1596,17 @@ typedef struct IndexPath
 	Selectivity indexselectivity;
 } IndexPath;
 
+/*
+ * read sorted data from brin index
+ *
+ * We use IndexPath, because that's what amcostestimate is expecting, but
+ * we typedef it as a separate struct.
+ */
+typedef struct BrinSortPath
+{
+	IndexPath	ipath;
+} BrinSortPath;
+
 /*
  * Each IndexClause references a RestrictInfo node from the query's WHERE
  * or JOIN conditions, and shows how that restriction can be applied to
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21e642a64c4..c4ef5362acc 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -495,6 +495,32 @@ typedef struct IndexOnlyScan
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
 } IndexOnlyScan;
 
+
+typedef struct BrinSort
+{
+	Scan		scan;
+	Oid			indexid;		/* OID of index to scan */
+	List	   *indexqual;		/* list of index quals (usually OpExprs) */
+	List	   *indexqualorig;	/* the same in original form */
+	ScanDirection indexorderdir;	/* forward or backward or don't care */
+
+	/* number of sort-key columns */
+	int			numCols;
+
+	/* their indexes in the target list */
+	AttrNumber *sortColIdx pg_node_attr(array_size(numCols));
+
+	/* OIDs of operators to sort them by */
+	Oid		   *sortOperators pg_node_attr(array_size(numCols));
+
+	/* OIDs of collations */
+	Oid		   *collations pg_node_attr(array_size(numCols));
+
+	/* NULLS FIRST/LAST directions */
+	bool	   *nullsFirst pg_node_attr(array_size(numCols));
+
+} BrinSort;
+
 /* ----------------
  *		bitmap index scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 204e94b6d10..b77440728d1 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -69,6 +69,7 @@ extern PGDLLIMPORT bool enable_parallel_append;
 extern PGDLLIMPORT bool enable_parallel_hash;
 extern PGDLLIMPORT bool enable_partition_pruning;
 extern PGDLLIMPORT bool enable_async_append;
+extern PGDLLIMPORT bool enable_brinsort;
 extern PGDLLIMPORT int constraint_exclusion;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
@@ -79,6 +80,8 @@ extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 							ParamPathInfo *param_info);
 extern void cost_index(IndexPath *path, PlannerInfo *root,
 					   double loop_count, bool partial_path);
+extern void cost_brinsort(BrinSortPath *path, PlannerInfo *root,
+						  double loop_count, bool partial_path);
 extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 								  ParamPathInfo *param_info,
 								  Path *bitmapqual, double loop_count);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 050f00e79a4..2415c07a856 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -49,6 +49,17 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									Relids required_outer,
 									double loop_count,
 									bool partial_path);
+extern BrinSortPath *create_brinsort_path(PlannerInfo *root,
+									IndexOptInfo *index,
+									List *indexclauses,
+									List *indexorderbys,
+									List *indexorderbycols,
+									List *pathkeys,
+									ScanDirection indexscandir,
+									bool indexonly,
+									Relids required_outer,
+									double loop_count,
+									bool partial_path);
 extern BitmapHeapPath *create_bitmap_heap_path(PlannerInfo *root,
 											   RelOptInfo *rel,
 											   Path *bitmapqual,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 41f765d3422..6aa50257730 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -213,6 +213,9 @@ extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,
 extern Path *get_cheapest_parallel_safe_total_inner(List *paths);
 extern List *build_index_pathkeys(PlannerInfo *root, IndexOptInfo *index,
 								  ScanDirection scandir);
+extern List *build_index_pathkeys_brin(PlannerInfo *root, IndexOptInfo *index,
+								  TargetEntry *tle, int idx,
+								  bool reverse_sort, bool nulls_first);
 extern List *build_partition_pathkeys(PlannerInfo *root, RelOptInfo *partrel,
 									  ScanDirection scandir, bool *partialkeys);
 extern List *build_expression_pathkey(PlannerInfo *root, Expr *expr,
-- 
2.37.3

Zhihong Yu

zyu@yugabyte.com

about 3 years ago

In reply to: Tomas Vondra (#1)

Re: PATCH: Using BRIN indexes for sorted output

On Sat, Oct 15, 2022 at 5:34 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

Hi,

There have been a couple discussions about using BRIN indexes for
sorting - in fact this was mentioned even in the "Improving Indexing
Performance" unconference session this year (don't remember by whom).
But I haven't seen any patches, so here's one.

The idea is that we can use information about ranges to split the table
into smaller parts that can be sorted in smaller chunks. For example if
you have a tiny 2MB table with two ranges, with values in [0,100] and
[101,200] intervals, then it's clear we can sort the first range, output
tuples, and then sort/output the second range.

The attached patch builds "BRIN Sort" paths/plans, closely resembling
index scans, only for BRIN indexes. And this special type of index scan
does what was mentioned above - incrementally sorts the data. It's a bit
more complicated because of overlapping ranges, ASC/DESC, NULL etc.

This is disabled by default, using a GUC enable_brinsort (you may need
to tweak other GUCs to disable parallel plans etc.).

A trivial example, demonstrating the benefits:

create table t (a int) with (fillfactor = 10);
insert into t select i from generate_series(1,10000000) s(i);

First, a simple LIMIT query:

explain (analyze, costs off) select * from t order by a limit 10;

QUERY PLAN
------------------------------------------------------------------------
Limit (actual time=1879.768..1879.770 rows=10 loops=1)
-> Sort (actual time=1879.767..1879.768 rows=10 loops=1)
Sort Key: a
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on t
(actual time=0.007..1353.110 rows=10000000 loops=1)
Planning Time: 0.083 ms
Execution Time: 1879.786 ms
(7 rows)

QUERY PLAN
------------------------------------------------------------------------
Limit (actual time=1.217..1.219 rows=10 loops=1)
-> BRIN Sort using t_a_idx on t
(actual time=1.216..1.217 rows=10 loops=1)
Sort Key: a
Planning Time: 0.084 ms
Execution Time: 1.234 ms
(5 rows)

That's a pretty nice improvement - of course, this is thanks to having a
perfectly sequential, and the difference can be almost arbitrary by
making the table smaller/larger. Similarly, if the table gets less
sequential (making ranges to overlap), the BRIN plan will be more
expensive. Feel free to experiment with other data sets.

However, not only the LIMIT queries can improve - consider a sort of the
whole table:

test=# explain (analyze, costs off) select * from t order by a;

QUERY PLAN
-------------------------------------------------------------------------
Sort (actual time=2806.468..3487.213 rows=10000000 loops=1)
Sort Key: a
Sort Method: external merge Disk: 117528kB
-> Seq Scan on t (actual time=0.018..1498.754 rows=10000000 loops=1)
Planning Time: 0.110 ms
Execution Time: 3766.825 ms
(6 rows)

test=# explain (analyze, costs off) select * from t order by a;
QUERY PLAN

----------------------------------------------------------------------------------
BRIN Sort using t_a_idx on t (actual time=1.210..2670.875 rows=10000000
loops=1)
Sort Key: a
Planning Time: 0.073 ms
Execution Time: 2939.324 ms
(4 rows)

Right - not a huge difference, but still a nice 25% speedup, mostly due
to not having to spill data to disk and sorting smaller amounts of data.

There's a bunch of issues with this initial version of the patch,
usually described in XXX comments in the relevant places.6)

1) The paths are created in build_index_paths() because that's what
creates index scans (which the new path resembles). But that is expected
to produce IndexPath, not BrinSortPath, so it's not quite correct.
Should be somewhere "higher" I guess.

2) BRIN indexes don't have internal ordering, i.e. ASC/DESC and NULLS
FIRST/LAST does not really matter for them. The patch just generates
paths for all 4 combinations (or tries to). Maybe there's a better way.

3) I'm not quite sure the separation of responsibilities between
opfamily and opclass is optimal. I added a new amproc, but maybe this
should be split differently. At the moment only minmax indexes have
this, but adding this to minmax-multi should be trivial.

4) The state changes in nodeBrinSort is a bit confusing. Works, but may
need cleanup and refactoring. Ideas welcome.

5) The costing is essentially just plain cost_index. I have some ideas
about BRIN costing in general, which I'll post in a separate thread (as
it's not specific to this patch).

6) At the moment this only picks one of the index keys, specified in the
ORDER BY clause. I think we can generalize this to multiple keys, but
thinking about multi-key ranges was a bit too much for me. The good
thing is this nicely combines with IncrementalSort.

7) Only plain index keys for the ORDER BY keys, no expressions. Should
not be hard to fix, though.

8) Parallel version is not supported, but I think it shouldn't be
possible. Just make the leader build the range info, and then let the
workers to acquire/sort ranges and merge them by Gather Merge.

9) I was also thinking about leveraging other indexes to quickly
eliminate ranges that need to be sorted. The node does evaluate filter,
of course, but only after reading the tuple from the range. But imagine
we allow BrinSort to utilize BRIN indexes to evaluate the filter - in
that case we might skip many ranges entirely. Essentially like a bitmap
index scan does, except that building the bitmap incrementally with BRIN
is trivial - you can quickly check if a particular range matches or not.
With other indexes (e.g. btree) you essentially need to evaluate the
filter completely, and only then you can look at the bitmap. Which seems
rather against the idea of this patch, which is about low startup cost.
Of course, the condition might be very selective, but then you probably
can just fetch the matching tuples and do a Sort.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi,
I am still going over the patch.

Minor: for #8, I guess you meant `it should be possible` .

Cheers

Tomas Vondra

tomas.vondra@enterprisedb.com

about 3 years ago

In reply to: Zhihong Yu (#2)

Re: PATCH: Using BRIN indexes for sorted output

On 10/15/22 15:46, Zhihong Yu wrote:

...
8) Parallel version is not supported, but I think it shouldn't be
possible. Just make the leader build the range info, and then let the
workers to acquire/sort ranges and merge them by Gather Merge.
...
Hi,
I am still going over the patch.

Minor: for #8, I guess you meant `it should be possible` .

Yes, I meant to say it should be possible. Sorry for the confusion.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Zhihong Yu

zyu@yugabyte.com

about 3 years ago

In reply to: Tomas Vondra (#3)

Re: PATCH: Using BRIN indexes for sorted output

On Sat, Oct 15, 2022 at 8:23 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

On 10/15/22 15:46, Zhihong Yu wrote:

...
8) Parallel version is not supported, but I think it shouldn't be
possible. Just make the leader build the range info, and then let the
workers to acquire/sort ranges and merge them by Gather Merge.
...
Hi,
I am still going over the patch.

Minor: for #8, I guess you meant `it should be possible` .

Yes, I meant to say it should be possible. Sorry for the confusion.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi,

For brin_minmax_ranges, looking at the assignment to gottuple and
reading gottuple, it seems variable gottuple can be omitted - we can check
tup directly.

+   /* Maybe mark the range as processed. */
+   range->processed |= mark_processed;

`Maybe` can be dropped.

For brinsort_load_tuples(), do we need to check for interrupts inside the
loop ?
Similar question for subsequent methods involving loops, such
as brinsort_load_unsummarized_ranges.

Cheers

Tomas Vondra

tomas.vondra@enterprisedb.com

about 3 years ago

In reply to: Tomas Vondra (#1)

Re: PATCH: Using BRIN indexes for sorted output

On 10/15/22 14:33, Tomas Vondra wrote:

Hi,

...

There's a bunch of issues with this initial version of the patch,
usually described in XXX comments in the relevant places.6)

...

I forgot to mention one important issue in my list yesterday, and that's
memory consumption. The way the patch is coded now, the new BRIN support
function (brin_minmax_ranges) produces information about *all* ranges in
one go, which may be an issue. The worst case is 32TB table, with 1-page
BRIN ranges, which means ~4 billion ranges. The info is an array of ~32B
structs, so this would require ~128GB of RAM. With the default 128-page
ranges, it's still be ~1GB, which is quite a lot.

We could have a discussion about what's the reasonable size of BRIN
ranges on such large tables (e.g. building a bitmap on 4 billion ranges
is going to be "not cheap" so this is likely pretty rare). But we should
not introduce new nodes that ignore work_mem, so we need a way to deal
with such cases somehow.

The easiest solution likely is to check this while planning - we can
check the table size, calculate the number of BRIN ranges, and check
that the range info fits into work_mem, and just not create the path
when it gets too large. That's what we did for HashAgg, although that
decision was unreliable because estimating GROUP BY cardinality is hard.

The wrinkle here is that counting just the range info (BrinRange struct)
does not include the values for by-reference types. We could use average
width - that's just an estimate, though.

A more comprehensive solution seems to be to allow requesting chunks of
the BRIN ranges. So that we'd get "slices" of ranges and we'd process
those. So for example if you have 1000 ranges, and you can only handle
100 at a time, we'd do 10 loops, each requesting 100 ranges.

This has another problem - we do care about "overlaps", and we can't
really know if the overlapping ranges will be in the same "slice"
easily. The chunks would be sorted (for example) by maxval. But there
can be a range with much higher maxval (thus in some future slice), but
very low minval (thus intersecting with ranges in the current slice).

Imagine ranges with these minval/maxval values, sorted by maxval:

[101,200]
[201,300]
[301,400]
[150,500]

and let's say we can only process 2-range slices. So we'll get the first
two, but both of them intersect with the very last range.

We could always include all the intersecting ranges into the slice, but
what if there are too many very "wide" ranges?

So I think this will need to switch to an iterative communication with
the BRIN index - instead of asking "give me info about all the ranges",
we'll need a way to

- request the next range (sorted by maxval)
- request the intersecting ranges one by one (sorted by minval)

Of course, the BRIN side will have some of the same challenges with
tracking the info without breaking the work_mem limit, but I suppose it
can store the info into a tuplestore/tuplesort, and use that instead of
plain in-memory array. Alternatively, it could just return those, and
BrinSort would use that. OTOH it seems cleaner to have some sort of API,
especially if we want to support e.g. minmax-multi opclasses, that have
a more complicated concept of "intersection".

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Zhihong Yu

zyu@yugabyte.com

about 3 years ago

In reply to: Tomas Vondra (#5)

Re: PATCH: Using BRIN indexes for sorted output

On Sun, Oct 16, 2022 at 6:51 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

On 10/15/22 14:33, Tomas Vondra wrote:

Hi,

...

There's a bunch of issues with this initial version of the patch,
usually described in XXX comments in the relevant places.6)

...

I forgot to mention one important issue in my list yesterday, and that's
memory consumption. The way the patch is coded now, the new BRIN support
function (brin_minmax_ranges) produces information about *all* ranges in
one go, which may be an issue. The worst case is 32TB table, with 1-page
BRIN ranges, which means ~4 billion ranges. The info is an array of ~32B
structs, so this would require ~128GB of RAM. With the default 128-page
ranges, it's still be ~1GB, which is quite a lot.

We could have a discussion about what's the reasonable size of BRIN
ranges on such large tables (e.g. building a bitmap on 4 billion ranges
is going to be "not cheap" so this is likely pretty rare). But we should
not introduce new nodes that ignore work_mem, so we need a way to deal
with such cases somehow.

The easiest solution likely is to check this while planning - we can
check the table size, calculate the number of BRIN ranges, and check
that the range info fits into work_mem, and just not create the path
when it gets too large. That's what we did for HashAgg, although that
decision was unreliable because estimating GROUP BY cardinality is hard.

The wrinkle here is that counting just the range info (BrinRange struct)
does not include the values for by-reference types. We could use average
width - that's just an estimate, though.

A more comprehensive solution seems to be to allow requesting chunks of
the BRIN ranges. So that we'd get "slices" of ranges and we'd process
those. So for example if you have 1000 ranges, and you can only handle
100 at a time, we'd do 10 loops, each requesting 100 ranges.

This has another problem - we do care about "overlaps", and we can't
really know if the overlapping ranges will be in the same "slice"
easily. The chunks would be sorted (for example) by maxval. But there
can be a range with much higher maxval (thus in some future slice), but
very low minval (thus intersecting with ranges in the current slice).

Imagine ranges with these minval/maxval values, sorted by maxval:

[101,200]
[201,300]
[301,400]
[150,500]

and let's say we can only process 2-range slices. So we'll get the first
two, but both of them intersect with the very last range.

We could always include all the intersecting ranges into the slice, but
what if there are too many very "wide" ranges?

So I think this will need to switch to an iterative communication with
the BRIN index - instead of asking "give me info about all the ranges",
we'll need a way to

- request the next range (sorted by maxval)
- request the intersecting ranges one by one (sorted by minval)

Of course, the BRIN side will have some of the same challenges with
tracking the info without breaking the work_mem limit, but I suppose it
can store the info into a tuplestore/tuplesort, and use that instead of
plain in-memory array. Alternatively, it could just return those, and
BrinSort would use that. OTOH it seems cleaner to have some sort of API,
especially if we want to support e.g. minmax-multi opclasses, that have
a more complicated concept of "intersection".

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi,

In your example involving [150,500], can this range be broken down into 4
ranges, ending in 200, 300, 400 and 500, respectively ?
That way, there is no intersection among the ranges.

bq. can store the info into a tuplestore/tuplesort

Wouldn't this involve disk accesses which may reduce the effectiveness of
BRIN sort ?

Cheers

Tomas Vondra

tomas.vondra@enterprisedb.com

about 3 years ago

In reply to: Zhihong Yu (#4)

Re: PATCH: Using BRIN indexes for sorted output

On 10/16/22 03:36, Zhihong Yu wrote:

On Sat, Oct 15, 2022 at 8:23 AM Tomas Vondra
<tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
wrote:

On 10/15/22 15:46, Zhihong Yu wrote:

...
8) Parallel version is not supported, but I think it shouldn't be
possible. Just make the leader build the range info, and then

let the

workers to acquire/sort ranges and merge them by Gather Merge.
...
Hi,
I am still going over the patch.

Minor: for #8, I guess you meant `it should be possible` .

Yes, I meant to say it should be possible. Sorry for the confusion.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
The Enterprise PostgreSQL Company

Hi,

For brin_minmax_ranges, looking at the assignment to gottuple and
reading gottuple, it seems variable gottuple can be omitted - we can
check tup directly.
+   /* Maybe mark the range as processed. */
+   range->processed |= mark_processed;
`Maybe` can be dropped.

No, because the "mark_processed" may be false. So we may not mark it as
processed in some cases.

For brinsort_load_tuples(), do we need to check for interrupts inside
the loop ?
Similar question for subsequent methods involving loops, such
as brinsort_load_unsummarized_ranges.

We could/should, although most of the loops should be very short.

regrds

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tomas Vondra

tomas.vondra@enterprisedb.com

about 3 years ago

In reply to: Zhihong Yu (#6)

Re: PATCH: Using BRIN indexes for sorted output

On 10/16/22 16:01, Zhihong Yu wrote:

On Sun, Oct 16, 2022 at 6:51 AM Tomas Vondra
<tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
wrote:

On 10/15/22 14:33, Tomas Vondra wrote:

Hi,

...

There's a bunch of issues with this initial version of the patch,
usually described in XXX comments in the relevant places.6)

...

I forgot to mention one important issue in my list yesterday, and that's
memory consumption. The way the patch is coded now, the new BRIN support
function (brin_minmax_ranges) produces information about *all* ranges in
one go, which may be an issue. The worst case is 32TB table, with 1-page
BRIN ranges, which means ~4 billion ranges. The info is an array of ~32B
structs, so this would require ~128GB of RAM. With the default 128-page
ranges, it's still be ~1GB, which is quite a lot.

We could have a discussion about what's the reasonable size of BRIN
ranges on such large tables (e.g. building a bitmap on 4 billion ranges
is going to be "not cheap" so this is likely pretty rare). But we should
not introduce new nodes that ignore work_mem, so we need a way to deal
with such cases somehow.

The easiest solution likely is to check this while planning - we can
check the table size, calculate the number of BRIN ranges, and check
that the range info fits into work_mem, and just not create the path
when it gets too large. That's what we did for HashAgg, although that
decision was unreliable because estimating GROUP BY cardinality is hard.

The wrinkle here is that counting just the range info (BrinRange struct)
does not include the values for by-reference types. We could use average
width - that's just an estimate, though.

A more comprehensive solution seems to be to allow requesting chunks of
the BRIN ranges. So that we'd get "slices" of ranges and we'd process
those. So for example if you have 1000 ranges, and you can only handle
100 at a time, we'd do 10 loops, each requesting 100 ranges.

This has another problem - we do care about "overlaps", and we can't
really know if the overlapping ranges will be in the same "slice"
easily. The chunks would be sorted (for example) by maxval. But there
can be a range with much higher maxval (thus in some future slice), but
very low minval (thus intersecting with ranges in the current slice).

Imagine ranges with these minval/maxval values, sorted by maxval:

[101,200]
[201,300]
[301,400]
[150,500]

and let's say we can only process 2-range slices. So we'll get the first
two, but both of them intersect with the very last range.

We could always include all the intersecting ranges into the slice, but
what if there are too many very "wide" ranges?

So I think this will need to switch to an iterative communication with
the BRIN index - instead of asking "give me info about all the ranges",
we'll need a way to

- request the next range (sorted by maxval)
- request the intersecting ranges one by one (sorted by minval)

Of course, the BRIN side will have some of the same challenges with
tracking the info without breaking the work_mem limit, but I suppose it
can store the info into a tuplestore/tuplesort, and use that instead of
plain in-memory array. Alternatively, it could just return those, and
BrinSort would use that. OTOH it seems cleaner to have some sort of API,
especially if we want to support e.g. minmax-multi opclasses, that have
a more complicated concept of "intersection".

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
The Enterprise PostgreSQL Company

Hi,
In your example involving [150,500], can this range be broken down into
4 ranges, ending in 200, 300, 400 and 500, respectively ?
That way, there is no intersection among the ranges.

Not really, I think. These "value ranges" map to "page ranges" and how
would you split those? I mean, you know values [150,500] map to blocks
[0,127]. You split the values into [150,200], [201,300], [301,400]. How
do you split the page range [0,127]?

Also, splitting a range into more ranges is likely making the issue
worse, because it increases the number of ranges, right? And I mean,
much worse, because imagine a "wide" range that overlaps with every
other range - the number of ranges would explode.

It's not clear to me at which point you'd make the split. At the
beginning, right after loading the ranges from BRIN index? A lot of that
may be unnecessary, in case the range is loaded as a "non-intersecting"
range.

Try to formulate the whole algorithm. Maybe I'm missing something.

The current algorithm is something like this:

1. request info about ranges from the BRIN opclass
2. sort them by maxval and minval
3. NULLS FIRST: read all ranges that might have NULLs => output
4. read the next range (by maxval) into tuplesort
(if no more ranges, go to (9))
5. load all tuples from "splill" tuplestore, compare to maxval
6. load all tuples from no-summarized ranges (first range only)
(into tuplesort/tuplestore, depending on maxval comparison)
7. load all intersecting ranges (with minval < current maxval)
(into tuplesort/tuplestore, depending on maxval comparison)
8. sort the tuplesort, output all tuples, then back to (4)
9. NULLS LAST: read all ranges that might have NULLs => output
10. done

For "DESC" ordering the process is almost the same, except that we swap
minval/maxval in most places.

bq. can store the info into a tuplestore/tuplesort

Wouldn't this involve disk accesses which may reduce the effectiveness
of BRIN sort ?

Yes, it might. But the question is whether the result is still faster
than alternative plans (e.g. seqscan+sort), and those are likely to do
even more I/O.

Moreover, for "regular" cases this shouldn't be a significant issue,
because the stuff will fit into work_mem and so there'll be no I/O. But
it'll handle those extreme cases gracefully.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tom Lane

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Tomas Vondra (#5)

Re: PATCH: Using BRIN indexes for sorted output

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

I forgot to mention one important issue in my list yesterday, and that's
memory consumption.

TBH, this is all looking like vastly more complexity than benefit.
It's going to be impossible to produce a reliable cost estimate
given all the uncertainty, and I fear that will end in picking
BRIN-based sorting when it's not actually a good choice.

The examples you showed initially are cherry-picked to demonstrate
the best possible case, which I doubt has much to do with typical
real-world tables. It would be good to see what happens with
not-perfectly-sequential data before even deciding this is worth
spending more effort on. It also seems kind of unfair to decide
that the relevant comparison point is a seqscan rather than a
btree indexscan.

regards, tom lane

#10

Zhihong Yu

zyu@yugabyte.com

about 3 years ago

In reply to: Tomas Vondra (#8)

Re: PATCH: Using BRIN indexes for sorted output

On Sun, Oct 16, 2022 at 7:33 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

On 10/16/22 16:01, Zhihong Yu wrote:

On Sun, Oct 16, 2022 at 6:51 AM Tomas Vondra
<tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
wrote:

On 10/15/22 14:33, Tomas Vondra wrote:

Hi,

...

There's a bunch of issues with this initial version of the patch,
usually described in XXX comments in the relevant places.6)

...

I forgot to mention one important issue in my list yesterday, and

that's

memory consumption. The way the patch is coded now, the new BRIN

support

function (brin_minmax_ranges) produces information about *all*

ranges in

one go, which may be an issue. The worst case is 32TB table, with

1-page

BRIN ranges, which means ~4 billion ranges. The info is an array of

~32B

structs, so this would require ~128GB of RAM. With the default

128-page

ranges, it's still be ~1GB, which is quite a lot.

We could have a discussion about what's the reasonable size of BRIN
ranges on such large tables (e.g. building a bitmap on 4 billion

ranges

is going to be "not cheap" so this is likely pretty rare). But we

should

not introduce new nodes that ignore work_mem, so we need a way to

deal

with such cases somehow.

The easiest solution likely is to check this while planning - we can
check the table size, calculate the number of BRIN ranges, and check
that the range info fits into work_mem, and just not create the path
when it gets too large. That's what we did for HashAgg, although that
decision was unreliable because estimating GROUP BY cardinality is

hard.

The wrinkle here is that counting just the range info (BrinRange

struct)

does not include the values for by-reference types. We could use

average

width - that's just an estimate, though.

A more comprehensive solution seems to be to allow requesting chunks

of

the BRIN ranges. So that we'd get "slices" of ranges and we'd process
those. So for example if you have 1000 ranges, and you can only

handle

100 at a time, we'd do 10 loops, each requesting 100 ranges.

This has another problem - we do care about "overlaps", and we can't
really know if the overlapping ranges will be in the same "slice"
easily. The chunks would be sorted (for example) by maxval. But there
can be a range with much higher maxval (thus in some future slice),

but

very low minval (thus intersecting with ranges in the current slice).

Imagine ranges with these minval/maxval values, sorted by maxval:

[101,200]
[201,300]
[301,400]
[150,500]

and let's say we can only process 2-range slices. So we'll get the

first

two, but both of them intersect with the very last range.

We could always include all the intersecting ranges into the slice,

but

what if there are too many very "wide" ranges?

So I think this will need to switch to an iterative communication

with

the BRIN index - instead of asking "give me info about all the

ranges",

we'll need a way to

- request the next range (sorted by maxval)
- request the intersecting ranges one by one (sorted by minval)

Of course, the BRIN side will have some of the same challenges with
tracking the info without breaking the work_mem limit, but I suppose

it

can store the info into a tuplestore/tuplesort, and use that instead

of

plain in-memory array. Alternatively, it could just return those, and
BrinSort would use that. OTOH it seems cleaner to have some sort of

API,

especially if we want to support e.g. minmax-multi opclasses, that

have

a more complicated concept of "intersection".

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com <

http://www.enterprisedb.com>

The Enterprise PostgreSQL Company

Hi,
In your example involving [150,500], can this range be broken down into
4 ranges, ending in 200, 300, 400 and 500, respectively ?
That way, there is no intersection among the ranges.

Not really, I think. These "value ranges" map to "page ranges" and how
would you split those? I mean, you know values [150,500] map to blocks
[0,127]. You split the values into [150,200], [201,300], [301,400]. How
do you split the page range [0,127]?

Also, splitting a range into more ranges is likely making the issue
worse, because it increases the number of ranges, right? And I mean,
much worse, because imagine a "wide" range that overlaps with every
other range - the number of ranges would explode.

It's not clear to me at which point you'd make the split. At the
beginning, right after loading the ranges from BRIN index? A lot of that
may be unnecessary, in case the range is loaded as a "non-intersecting"
range.

Try to formulate the whole algorithm. Maybe I'm missing something.

The current algorithm is something like this:

1. request info about ranges from the BRIN opclass
2. sort them by maxval and minval
3. NULLS FIRST: read all ranges that might have NULLs => output
4. read the next range (by maxval) into tuplesort
(if no more ranges, go to (9))
5. load all tuples from "splill" tuplestore, compare to maxval
6. load all tuples from no-summarized ranges (first range only)
(into tuplesort/tuplestore, depending on maxval comparison)
7. load all intersecting ranges (with minval < current maxval)
(into tuplesort/tuplestore, depending on maxval comparison)
8. sort the tuplesort, output all tuples, then back to (4)
9. NULLS LAST: read all ranges that might have NULLs => output
10. done

For "DESC" ordering the process is almost the same, except that we swap
minval/maxval in most places.

Hi,

Thanks for the quick reply.

I don't have good answer w.r.t. splitting the page range [0,127] now. Let
me think more about it.

The 10 step flow (subject to changes down the road) should be either given
in the description of the patch or, written as comment inside the code.
This would help people grasp the concept much faster.

BTW splill seems to be a typo - I assume you meant spill.

Cheers

#11

Tomas Vondra

tomas.vondra@enterprisedb.com

about 3 years ago

In reply to: Tom Lane (#9)

Re: PATCH: Using BRIN indexes for sorted output

On 10/16/22 16:41, Tom Lane wrote:

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

I forgot to mention one important issue in my list yesterday, and that's
memory consumption.

TBH, this is all looking like vastly more complexity than benefit.
It's going to be impossible to produce a reliable cost estimate
given all the uncertainty, and I fear that will end in picking
BRIN-based sorting when it's not actually a good choice.

Maybe. If it turns out the estimates we have are insufficient to make
good planning decisions, that's life.

As I wrote in my message, I know the BRIN costing is a bit shaky in
general (not just for this new operation), and I intend to propose some
improvement in a separate patch.

I think the main issue with BRIN costing is that we have no stats about
the ranges, and we can't estimate how many ranges we'll really end up
accessing. If you have 100 rows, will that be 1 range or 100 ranges? Or
for the BRIN Sort, how many overlapping ranges will there be?

I intend to allow index AMs to collect custom statistics, and the BRIN
minmax opfamily would collect e.g. this:

1) number of non-summarized ranges
2) number of all-nulls ranges
3) number of has-nulls ranges
4) average number of overlaps (given a random range, how many other
ranges intersect with it)
5) how likely is it for a row to hit multiple ranges (cross-check
sample rows vs. ranges)

I believe this will allow much better / more reliable BRIN costing (the
number of overlaps is particularly useful for the this patch).

The examples you showed initially are cherry-picked to demonstrate
the best possible case, which I doubt has much to do with typical
real-world tables. It would be good to see what happens with
not-perfectly-sequential data before even deciding this is worth
spending more effort on.

Yes, the example was trivial "happy case" example. Obviously, the
performance degrades as the data become more random (with ranges wider),
forcing the BRIN Sort to read / sort more tuples.

But let's see an example with less correlated data, say, like this:

create table t (a int) with (fillfactor = 10);

insert into t select i + 10000 * random()
from generate_series(1,10000000) s(i);

With the fillfactor=10, there are ~2500 values per 1MB range, so this
means each range overlaps with ~4 more. The results then look like this:

1) select * from t order by a;

seqscan+sort: 4437 ms
brinsort: 4233 ms

2) select * from t order by a limit 10;

seqscan+sort: 1859 ms
brinsort: 4 ms

If you increase the random factor from 10000 to 100000 (so, 40 ranges),
the seqscan timings remain about the same, while brinsort gets to 5200
and 20 ms. And with 1M, it's ~6000 and 300 ms.

Only at 5000000, where we pretty much read 1/2 the table because the
ranges intersect, we get the same timing as the seqscan (for the LIMIT
query). The "full sort" query is more like 5000 vs. 6600 ms, so slower
but not by a huge amount.

Yes, this is a very simple example. I can do more tests with other
datasets (larger/smaller, different distribution, ...).

It also seems kind of unfair to decide
that the relevant comparison point is a seqscan rather than a
btree indexscan.

I don't think it's all that unfair. How likely is it to have both a BRIN
and btree index on the same column? And even if you do have such indexes
(say, on different sets of keys), we kinda already have this costing
issue with index and bitmap index scans.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12

Tomas Vondra

tomas.vondra@enterprisedb.com

about 3 years ago

In reply to: Zhihong Yu (#10)

Re: PATCH: Using BRIN indexes for sorted output

On 10/16/22 16:42, Zhihong Yu wrote:

...

I don't have good answer w.r.t. splitting the page range [0,127] now.
Let me think more about it.

Sure, no problem.

The 10 step flow (subject to changes down the road) should be either
given in the description of the patch or, written as comment inside the
code.
This would help people grasp the concept much faster.

True. I'll add it to the next version of the pach.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13

Matthias van de Meent

boekewurm+postgres@gmail.com

about 3 years ago

In reply to: Tom Lane (#9)

Re: PATCH: Using BRIN indexes for sorted output

On Sun, 16 Oct 2022 at 16:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It also seems kind of unfair to decide
that the relevant comparison point is a seqscan rather than a
btree indexscan.

I think the comparison against full table scan seems appropriate, as
the benefit of BRIN is less space usage when compared to other
indexes, and better IO selectivity than full table scans.

A btree easily requires 10x the space of a normal BRIN index, and may
require a lot of random IO whilst scanning. This BRIN-sorted scan
would have a much lower random IO cost during its scan, and would help
bridge the performance gap between having index that supports ordered
retrieval, and no index at all, which is especially steep in large
tables.

I think that BRIN would be an alternative to btree as a provider of
sorted data, even when the table is not 100% clustered. This
BRIN-assisted table sort can help reduce the amount of data that is
accessed in top-N sorts significantly, both at the index and at the
relation level, without having the space overhead of "all sortable
columns get a btree index".

If BRIN gets its HOT optimization back, the benefits would be even
larger, as we would then have an index that can speed up top-N sorts
without bloating other indexes, and at very low disk footprint.
Columns that are only occasionally accessed in a sorted manner could
then get BRIN minmax indexes to support this sort, at minimal overhead
to the rest of the application.

Kind regards,

Matthias van de Meent

#14

Matthias van de Meent

boekewurm+postgres@gmail.com

about 3 years ago

In reply to: Tomas Vondra (#8)

Re: PATCH: Using BRIN indexes for sorted output

First of all, it's really great to see that this is being worked on.

On Sun, 16 Oct 2022 at 16:34, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Try to formulate the whole algorithm. Maybe I'm missing something.

The current algorithm is something like this:

1. request info about ranges from the BRIN opclass
2. sort them by maxval and minval

Why sort on maxval and minval? That seems wasteful for effectively all
sorts, where range sort on minval should suffice: If you find a range
that starts at 100 in a list of ranges sorted at minval, you've
processed all values <100. You can't make a similar comparison when
that range is sorted on maxvals.

3. NULLS FIRST: read all ranges that might have NULLs => output
4. read the next range (by maxval) into tuplesort
(if no more ranges, go to (9))
5. load all tuples from "splill" tuplestore, compare to maxval

Instead of this, shouldn't an update to tuplesort that allows for
restarting the sort be better than this? Moving tuples that we've
accepted into BRINsort state but not yet returned around seems like a
waste of cycles, and I can't think of a reason why it can't work.

6. load all tuples from no-summarized ranges (first range only)
(into tuplesort/tuplestore, depending on maxval comparison)
7. load all intersecting ranges (with minval < current maxval)
(into tuplesort/tuplestore, depending on maxval comparison)
8. sort the tuplesort, output all tuples, then back to (4)
9. NULLS LAST: read all ranges that might have NULLs => output
10. done

For "DESC" ordering the process is almost the same, except that we swap
minval/maxval in most places.

When I was thinking about this feature at the PgCon unconference, I
was thinking about it more along the lines of the following system
(for ORDER BY col ASC NULLS FIRST):

1. prepare tuplesort Rs (for Rangesort) for BRIN tuples, ordered by
[has_nulls, min ASC]
2. scan info about ranges from BRIN, store them in Rs.
3. Finalize the sorting of Rs.
4. prepare tuplesort Ts (for Tuplesort) for sorting on the specified
column ordering.
5. load all tuples from no-summarized ranges into Ts'
6. while Rs has a block range Rs' with has_nulls:
- Remove Rs' from Rs
- store the tuples of Rs' range in Ts.
We now have all tuples with NULL in our sorted set; max_sorted = (NULL)
7. Finalize the Ts sorted set.
8. While the next tuple Ts' in the Ts tuplesort <= max_sorted
- Remove Ts' from Ts
- Yield Ts'
Now, all tuples up to and including max_sorted are yielded.
9. If there are no more ranges in Rs:
- Yield all remaining tuples from Ts, then return.
10. "un-finalize" Ts, so that we can start adding tuples to that tuplesort.
This is different from Tomas' implementation, as he loads the
tuples into a new tuplestore.
11. get the next item from Rs: Rs'
- remove Rs' from Rs
- assign Rs' min value to max_sorted
- store the tuples of Rs' range in Ts
12. while the next item Rs' from Rs has a min value of max_sorted:
- remove Rs' from Rs
- store the tuples of Rs' range in Ts
13. The 'new' value from the next item from Rs is stored in
max_sorted. If no such item exists, max_sorted is assigned a sentinel
value (+INF)
14. Go to Step 7

This set of operations requires a restarting tuplesort for Ts, but I
don't think that would result in many API changes for tuplesort. It
reduces the overhead of large overlapping ranges, as it doesn't need
to copy all tuples that have been read from disk but have not yet been
returned.

The maximum cost of this tuplesort would be the cost of sorting a
seqscanned table, plus sorting the relevant BRIN ranges, plus the 1
extra compare per tuple and range that are needed to determine whether
the range or tuple should be extracted from the tuplesort. The minimum
cost would be the cost of sorting all BRIN ranges, plus sorting all
tuples in one of the index's ranges.

Kind regards,

Matthias van de Meent

PS. Are you still planning on giving the HOT optimization for BRIN a
second try? I'm fairly confident that my patch at [0]/messages/by-id/CAEze2Wi9=Bay_=rTf8Z6WPgZ5V0tDOayszQJJO=R_9aaHvr+Tg@mail.gmail.com would fix the
issue that lead to the revert of that feature, but it introduced ABI
changes after the feature freeze and thus it didn't get in. The patch
might need some polishing, but I think it shouldn't take too much
extra effort to get into PG16.

[0]: /messages/by-id/CAEze2Wi9=Bay_=rTf8Z6WPgZ5V0tDOayszQJJO=R_9aaHvr+Tg@mail.gmail.com

#15

Tomas Vondra

tomas.vondra@enterprisedb.com

about 3 years ago

In reply to: Matthias van de Meent (#14)

Re: PATCH: Using BRIN indexes for sorted output

On 10/16/22 22:17, Matthias van de Meent wrote:

First of all, it's really great to see that this is being worked on.

On Sun, 16 Oct 2022 at 16:34, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Try to formulate the whole algorithm. Maybe I'm missing something.

The current algorithm is something like this:

1. request info about ranges from the BRIN opclass
2. sort them by maxval and minval

Why sort on maxval and minval? That seems wasteful for effectively all
sorts, where range sort on minval should suffice: If you find a range
that starts at 100 in a list of ranges sorted at minval, you've
processed all values <100. You can't make a similar comparison when
that range is sorted on maxvals.

Because that allows to identify overlapping ranges quickly.

Imagine you have the ranges sorted by maxval, which allows you to add
tuples in small increments. But how do you know there's not a range
(possibly with arbitrarily high maxval), that however overlaps with the
range we're currently processing?

Consider these ranges sorted by maxval

range #1 [0,100]
range #2 [101,200]
range #3 [150,250]
...
range #1000000 [190,1000000000]

processing the range #1 is simple, because there are no overlapping
ranges. When processing range #2, that's not the case - the following
range #3 is overlapping too, so we need to load the tuples too. But
there may be other ranges (in arbitrary distance) also overlapping.

So we either have to cross-check everything with everything - that's
O(N^2) so not great, or we can invent a way to eliminate ranges that
can't overlap.

The patch does that by having two arrays - one sorted by maxval, one
sorted by minval. After proceeding to the next range by maxval (using
the first array), the minval-sorted array is used to detect overlaps.
This can be done quickly, because we only care for new matches since the
previous range, so we can remember the index to the array and start from
it. And we can stop once the minval exceeds the maxval for the range in
the first step. Because we'll only sort tuples up to that point.

3. NULLS FIRST: read all ranges that might have NULLs => output
4. read the next range (by maxval) into tuplesort
(if no more ranges, go to (9))
5. load all tuples from "splill" tuplestore, compare to maxval

Instead of this, shouldn't an update to tuplesort that allows for
restarting the sort be better than this? Moving tuples that we've
accepted into BRINsort state but not yet returned around seems like a
waste of cycles, and I can't think of a reason why it can't work.

I don't understand what you mean by "update to tuplesort". Can you
elaborate?

The point of spilling them into a tuplestore is to make the sort cheaper
by not sorting tuples that can't possibly be produced, because the value
exceeds the current maxval. Consider ranges sorted by maxval

[0,1000]
[500,1500]
[1001,2000]
...

We load tuples from [0,1000] and use 1000 as "threshold" up to which we
can sort. But we have to load tuples from the overlapping range(s) too,
e.g. from [500,1500] except that all tuples with values > 1000 can't be
produced (because there might be yet more ranges intersecting with that
part).

So why sort these tuples at all? Imagine imperfectly correlated table
where each range overlaps with ~10 other ranges. If we feed all of that
into the tuplestore, we're now sorting 11x the amount of data.

Or maybe I just don't understand what you mean.

6. load all tuples from no-summarized ranges (first range only)
(into tuplesort/tuplestore, depending on maxval comparison)
7. load all intersecting ranges (with minval < current maxval)
(into tuplesort/tuplestore, depending on maxval comparison)
8. sort the tuplesort, output all tuples, then back to (4)
9. NULLS LAST: read all ranges that might have NULLs => output
10. done

For "DESC" ordering the process is almost the same, except that we swap
minval/maxval in most places.

When I was thinking about this feature at the PgCon unconference, I
was thinking about it more along the lines of the following system
(for ORDER BY col ASC NULLS FIRST):

1. prepare tuplesort Rs (for Rangesort) for BRIN tuples, ordered by
[has_nulls, min ASC]
2. scan info about ranges from BRIN, store them in Rs.
3. Finalize the sorting of Rs.
4. prepare tuplesort Ts (for Tuplesort) for sorting on the specified
column ordering.
5. load all tuples from no-summarized ranges into Ts'
6. while Rs has a block range Rs' with has_nulls:
- Remove Rs' from Rs
- store the tuples of Rs' range in Ts.
We now have all tuples with NULL in our sorted set; max_sorted = (NULL)
7. Finalize the Ts sorted set.
8. While the next tuple Ts' in the Ts tuplesort <= max_sorted
- Remove Ts' from Ts
- Yield Ts'
Now, all tuples up to and including max_sorted are yielded.
9. If there are no more ranges in Rs:
- Yield all remaining tuples from Ts, then return.
10. "un-finalize" Ts, so that we can start adding tuples to that tuplesort.
This is different from Tomas' implementation, as he loads the
tuples into a new tuplestore.
11. get the next item from Rs: Rs'
- remove Rs' from Rs
- assign Rs' min value to max_sorted
- store the tuples of Rs' range in Ts

I don't think this works, because we may get a range (Rs') with very
high maxval (thus read very late from Rs), but with very low minval.
AFAICS max_sorted must never go back, and this breaks it.

12. while the next item Rs' from Rs has a min value of max_sorted:
- remove Rs' from Rs
- store the tuples of Rs' range in Ts
13. The 'new' value from the next item from Rs is stored in
max_sorted. If no such item exists, max_sorted is assigned a sentinel
value (+INF)
14. Go to Step 7

This set of operations requires a restarting tuplesort for Ts, but I
don't think that would result in many API changes for tuplesort. It
reduces the overhead of large overlapping ranges, as it doesn't need
to copy all tuples that have been read from disk but have not yet been
returned.

The maximum cost of this tuplesort would be the cost of sorting a
seqscanned table, plus sorting the relevant BRIN ranges, plus the 1
extra compare per tuple and range that are needed to determine whether
the range or tuple should be extracted from the tuplesort. The minimum
cost would be the cost of sorting all BRIN ranges, plus sorting all
tuples in one of the index's ranges.

I'm not a tuplesort expert, but my assumption it's better to sort
smaller amounts of rows - which is why the patch sorts only the rows it
knows it can actually output.

Kind regards,

Matthias van de Meent

PS. Are you still planning on giving the HOT optimization for BRIN a
second try? I'm fairly confident that my patch at [0] would fix the
issue that lead to the revert of that feature, but it introduced ABI
changes after the feature freeze and thus it didn't get in. The patch
might need some polishing, but I think it shouldn't take too much
extra effort to get into PG16.

Thanks for reminding me, I'll take a look before the next CF.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16

Matthias van de Meent

boekewurm+postgres@gmail.com

about 3 years ago

In reply to: Tomas Vondra (#15)

Re: PATCH: Using BRIN indexes for sorted output

On Mon, 17 Oct 2022 at 05:43, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 10/16/22 22:17, Matthias van de Meent wrote:

On Sun, 16 Oct 2022 at 16:34, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Try to formulate the whole algorithm. Maybe I'm missing something.

The current algorithm is something like this:

1. request info about ranges from the BRIN opclass
2. sort them by maxval and minval

Why sort on maxval and minval? That seems wasteful for effectively all
sorts, where range sort on minval should suffice: If you find a range
that starts at 100 in a list of ranges sorted at minval, you've
processed all values <100. You can't make a similar comparison when
that range is sorted on maxvals.

Because that allows to identify overlapping ranges quickly.

Imagine you have the ranges sorted by maxval, which allows you to add
tuples in small increments. But how do you know there's not a range
(possibly with arbitrarily high maxval), that however overlaps with the
range we're currently processing?

Why do we need to identify overlapping ranges specifically? If you
sort by minval, it becomes obvious that any subsequent range cannot
contain values < the minval of the next range in the list, allowing
you to emit any values less than the next, unprocessed, minmax range's
minval.

3. NULLS FIRST: read all ranges that might have NULLs => output
4. read the next range (by maxval) into tuplesort
(if no more ranges, go to (9))
5. load all tuples from "splill" tuplestore, compare to maxval

Instead of this, shouldn't an update to tuplesort that allows for
restarting the sort be better than this? Moving tuples that we've
accepted into BRINsort state but not yet returned around seems like a
waste of cycles, and I can't think of a reason why it can't work.

I don't understand what you mean by "update to tuplesort". Can you
elaborate?

Tuplesort currently only allows the following workflow: you to load
tuples, then call finalize, then extract tuples. There is currently no
way to add tuples once you've started extracting them.

For my design to work efficiently or without hacking into the
internals of tuplesort, we'd need a way to restart or 'un-finalize'
the tuplesort so that it returns to the 'load tuples' phase. Because
all data of the previous iteration is already sorted, adding more data
shouldn't be too expensive.

The point of spilling them into a tuplestore is to make the sort cheaper
by not sorting tuples that can't possibly be produced, because the value
exceeds the current maxval. Consider ranges sorted by maxval
[...]

Or maybe I just don't understand what you mean.

If we sort the ranges by minval like this:

1. [0,1000]
2. [0,999]
3. [50,998]
4. [100,997]
5. [100,996]
6. [150,995]

Then we can load and sort the values for range 1 and 2, and emit all
values up to (not including) 50 - the minval of the next,
not-yet-loaded range in the ordered list of ranges. Then add the
values from range 3 to the set of tuples we have yet to output; sort;
and then emit valus up to 100 (range 4's minval), etc. This reduces
the amount of tuples in the tuplesort to the minimum amount needed to
output any specific value.

If the ranges are sorted and loaded by maxval, like your algorithm expects:

1. [150,995]
2. [100,996]
3. [100,997]
4. [50,998]
5. [0,999]
6. [0,1000]

We need to load all ranges into the sort before it could start
emitting any tuples, as all ranges overlap with the first range.

[algo]

I don't think this works, because we may get a range (Rs') with very
high maxval (thus read very late from Rs), but with very low minval.
AFAICS max_sorted must never go back, and this breaks it.

max_sorted cannot go back, because it is the min value of the next
range in the list of ranges sorted by min value; see also above.

There is a small issue in my algorithm where I use <= for yielding
values where it should be <, where initialization of max_value to NULL
is then be incorrect, but apart from that I don't think there are any
issues with the base algorithm.

The maximum cost of this tuplesort would be the cost of sorting a
seqscanned table, plus sorting the relevant BRIN ranges, plus the 1
extra compare per tuple and range that are needed to determine whether
the range or tuple should be extracted from the tuplesort. The minimum
cost would be the cost of sorting all BRIN ranges, plus sorting all
tuples in one of the index's ranges.

I'm not a tuplesort expert, but my assumption it's better to sort
smaller amounts of rows - which is why the patch sorts only the rows it
knows it can actually output.

I see that the two main differences between our designs are in
answering these questions:

- How do we select table ranges for processing?
- How do we handle tuples that we know we can't output yet?

For the first, I think the differences are explained above. The main
drawback of your selection algorithm seems to be that your algorithm's
worst-case is "all ranges overlap", whereas my algorithm's worst case
is "all ranges start at the same value", which is only a subset of
your worst case.

For the second, the difference is whether we choose to sort the tuples
that are out-of-bounds, but are already in the working set due to
being returned from a range overlapping with the current bound.
My algorithm tries to reduce the overhead of increasing the sort
boundaries by also sorting the out-of-bound data, allowing for
O(n-less-than-newbound) overhead of extending the bounds (total
complexity for whole sort O(n-out-of-bound)), and O(n log n)
processing of all tuples during insertion.
Your algorithm - if I understand it correctly - seems to optimize for
faster results within the current bound by not sorting the
out-of-bounds data with O(1) processing when out-of-bounds, at the
cost of needing O(n-out-of-bound-tuples) operations when the maxval /
max_sorted boundary is increased, with a complexity of O(n*m) for an
average of n out-of-bound tuples and m bound updates.

Lastly, there is the small difference in how the ranges are extracted
from BRIN: I prefer and mention an iterative approach where the tuples
are extracted from the index and loaded into a tuplesort in some
iterative fashion (which spills to disk and does not need all tuples
to reside in memory), whereas your current approach was mentioned as
(paraphrasing) 'allocate all this data in one chunk and hope that
there is enough memory available'. I think this is not so much a
disagreement in best approach, but mostly a case of what could be made
to work; so in later updates I hope we'll see improvements here.

Kind regards,

Matthias van de Meent

#17

Tomas Vondra

tomas.vondra@enterprisedb.com

about 3 years ago

In reply to: Matthias van de Meent (#16)

Re: PATCH: Using BRIN indexes for sorted output

On 10/17/22 16:00, Matthias van de Meent wrote:

On Mon, 17 Oct 2022 at 05:43, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 10/16/22 22:17, Matthias van de Meent wrote:

On Sun, 16 Oct 2022 at 16:34, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Try to formulate the whole algorithm. Maybe I'm missing something.

The current algorithm is something like this:

1. request info about ranges from the BRIN opclass
2. sort them by maxval and minval

Why sort on maxval and minval? That seems wasteful for effectively all
sorts, where range sort on minval should suffice: If you find a range
that starts at 100 in a list of ranges sorted at minval, you've
processed all values <100. You can't make a similar comparison when
that range is sorted on maxvals.

Because that allows to identify overlapping ranges quickly.

Imagine you have the ranges sorted by maxval, which allows you to add
tuples in small increments. But how do you know there's not a range
(possibly with arbitrarily high maxval), that however overlaps with the
range we're currently processing?

Why do we need to identify overlapping ranges specifically? If you
sort by minval, it becomes obvious that any subsequent range cannot
contain values < the minval of the next range in the list, allowing
you to emit any values less than the next, unprocessed, minmax range's
minval.

D'oh! I think you're right, it should be possible to do this with only
sort by minval. And it might actually be better way to do that.

I think I chose the "maxval" ordering because it seemed reasonable.
Looking at the current range and using the maxval as the threshold
seemed reasonable. But it leads to a bunch of complexity with the
intersecting ranges, and I never reconsidered this choice. Silly me.

3. NULLS FIRST: read all ranges that might have NULLs => output
4. read the next range (by maxval) into tuplesort
(if no more ranges, go to (9))
5. load all tuples from "splill" tuplestore, compare to maxval

Instead of this, shouldn't an update to tuplesort that allows for
restarting the sort be better than this? Moving tuples that we've
accepted into BRINsort state but not yet returned around seems like a
waste of cycles, and I can't think of a reason why it can't work.

I don't understand what you mean by "update to tuplesort". Can you
elaborate?

Tuplesort currently only allows the following workflow: you to load
tuples, then call finalize, then extract tuples. There is currently no
way to add tuples once you've started extracting them.

For my design to work efficiently or without hacking into the
internals of tuplesort, we'd need a way to restart or 'un-finalize'
the tuplesort so that it returns to the 'load tuples' phase. Because
all data of the previous iteration is already sorted, adding more data
shouldn't be too expensive.

Not sure. I still think it's better to limit the amount of data we have
in the tuplesort. Even if the tuplesort can efficiently skip the already
sorted part, it'll still occupy disk space, possibly even force the data
to disk etc. (We'll still have to write that into a tuplestore, but that
should be relatively small and short-lived/recycled).

FWIW I wonder if the assumption that tuplesort can quickly skip already
sorted data holds e.g. for tuplesorts much larger than work_mem, but I
haven't checked that.

I'd also like to include some more info in the explain, like how many
times we did a sort, and what was the largest amount of data we sorted.
Although, maybe that could be tracked by tracking the tuplesort size of
the last sort.

Considering the tuplesort does not currently support this, I'll probably
stick to the existing approach with separate tuplestore. There's enough
complexity in the patch already, I think. The only thing we'll need with
the minval ordering is the ability to "peek ahead" to the next minval
(which is going to be the threshold used to route values either to
tuplesort or tuplestore).

The point of spilling them into a tuplestore is to make the sort cheaper
by not sorting tuples that can't possibly be produced, because the value
exceeds the current maxval. Consider ranges sorted by maxval
[...]

Or maybe I just don't understand what you mean.

If we sort the ranges by minval like this:

1. [0,1000]
2. [0,999]
3. [50,998]
4. [100,997]
5. [100,996]
6. [150,995]

Then we can load and sort the values for range 1 and 2, and emit all
values up to (not including) 50 - the minval of the next,
not-yet-loaded range in the ordered list of ranges. Then add the
values from range 3 to the set of tuples we have yet to output; sort;
and then emit valus up to 100 (range 4's minval), etc. This reduces
the amount of tuples in the tuplesort to the minimum amount needed to
output any specific value.

If the ranges are sorted and loaded by maxval, like your algorithm expects:

1. [150,995]
2. [100,996]
3. [100,997]
4. [50,998]
5. [0,999]
6. [0,1000]

We need to load all ranges into the sort before it could start
emitting any tuples, as all ranges overlap with the first range.

Right, thanks - I get this now.

[algo]

I don't think this works, because we may get a range (Rs') with very
high maxval (thus read very late from Rs), but with very low minval.
AFAICS max_sorted must never go back, and this breaks it.

max_sorted cannot go back, because it is the min value of the next
range in the list of ranges sorted by min value; see also above.

There is a small issue in my algorithm where I use <= for yielding
values where it should be <, where initialization of max_value to NULL
is then be incorrect, but apart from that I don't think there are any
issues with the base algorithm.

The maximum cost of this tuplesort would be the cost of sorting a
seqscanned table, plus sorting the relevant BRIN ranges, plus the 1
extra compare per tuple and range that are needed to determine whether
the range or tuple should be extracted from the tuplesort. The minimum
cost would be the cost of sorting all BRIN ranges, plus sorting all
tuples in one of the index's ranges.

I'm not a tuplesort expert, but my assumption it's better to sort
smaller amounts of rows - which is why the patch sorts only the rows it
knows it can actually output.

I see that the two main differences between our designs are in
answering these questions:

- How do we select table ranges for processing?
- How do we handle tuples that we know we can't output yet?

For the first, I think the differences are explained above. The main
drawback of your selection algorithm seems to be that your algorithm's
worst-case is "all ranges overlap", whereas my algorithm's worst case
is "all ranges start at the same value", which is only a subset of
your worst case.

Right, those are very good points.

For the second, the difference is whether we choose to sort the tuples
that are out-of-bounds, but are already in the working set due to
being returned from a range overlapping with the current bound.
My algorithm tries to reduce the overhead of increasing the sort
boundaries by also sorting the out-of-bound data, allowing for
O(n-less-than-newbound) overhead of extending the bounds (total
complexity for whole sort O(n-out-of-bound)), and O(n log n)
processing of all tuples during insertion.
Your algorithm - if I understand it correctly - seems to optimize for
faster results within the current bound by not sorting the
out-of-bounds data with O(1) processing when out-of-bounds, at the
cost of needing O(n-out-of-bound-tuples) operations when the maxval /
max_sorted boundary is increased, with a complexity of O(n*m) for an
average of n out-of-bound tuples and m bound updates.

Right. I wonder if we these are actually complementary approaches, and
we could/should pick between them depending on how many rows we expect
to consume.

My focus was LIMIT queries, so I favored the approach with the lowest
startup cost. I haven't quite planned for this to work so well even in
full-sort cases. That kinda surprised me (I wonder if the very large
tuplesorts - compared to work_mem - would hurt this, though).

Lastly, there is the small difference in how the ranges are extracted
from BRIN: I prefer and mention an iterative approach where the tuples
are extracted from the index and loaded into a tuplesort in some
iterative fashion (which spills to disk and does not need all tuples
to reside in memory), whereas your current approach was mentioned as
(paraphrasing) 'allocate all this data in one chunk and hope that
there is enough memory available'. I think this is not so much a
disagreement in best approach, but mostly a case of what could be made
to work; so in later updates I hope we'll see improvements here.

Right. I think I mentioned this in my post [1]/messages/by-id/1a7c2ff5-a855-64e9-0272-1f9947f8a558@enterprisedb.com, where I also envisioned
some sort of iterative approach. And I think you're right the approach
with ordering by minval is naturally more suitable because it just
consumes the single sequence of ranges.

regards

[1]: /messages/by-id/1a7c2ff5-a855-64e9-0272-1f9947f8a558@enterprisedb.com
/messages/by-id/1a7c2ff5-a855-64e9-0272-1f9947f8a558@enterprisedb.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18

Tomas Vondra

tomas.vondra@enterprisedb.com

about 3 years ago

In reply to: Tomas Vondra (#17)

6 attachment(s)

Re: PATCH: Using BRIN indexes for sorted output

Hi,

here's an updated/reworked version of the patch, on top of the "BRIN
statistics" patch as 0001 (because some of the stuff is useful, but we
can ignore this part in this thread).

Warning: I realized the new node is somewhat broken when it comes to
projection and matching the indexed column, most likely because the
targetlists are wired/processed incorrectly or something like that. So
when experimenting with this, just index the first column of the table
and don't do anything requiring a projection. I'll get this fixed, but
I've been focusing on the other stuff. I'm not particularly familiar
with this tlist/project stuff, so any help is welcome.

The main change in this version is the adoption of multiple ideas
suggested by Matthias in his earlier responses.

Firstly, this changes how the index opclass passes information to the
executor node. Instead of using a plain array, we now use a tuplesort.
This addresses the memory consumption issues with large number of
ranges, and it also simplifies the sorting etc. which is now handled by
the tuplesort. The support procedure simply fills a tuplesort and then
hands it over to the caller (more or less).

Secondly, instead of ordering the ranges by maxval, this orders them by
minval (as suggested by Matthias), which greatly simplifies the code
because we don't need to detect overlapping ranges etc.

More precisely, the ranges are sorted to get this ordering

- not yet summarized ranges
- ranges sorted by (minval, blkno)
- all-nulls ranges

This particular ordering is beneficial for the algorithm, which does two
passes over the ranges. For the NULLS LAST case (i.e. the default), we
do this:

- produce tuples with non-NULL values, ordered by the value
- produce tuples with NULL values (arbitrary order)

And each of these phases does a separate pass over the ranges (I'll get
to that in a minute). And the ordering is tailored to this.

Note: For DESC we'd sort by maxval, and for NULLS FIRST the phases would
happen in the opposite order, but those are details. Let's assume ASC
ordering with NULLS LAST, unless stated otherwise.

The idea here is that all not-summarized ranges need to be processed
always, both when processing NULLs and non-NULL values, which happens as
two separate passes over ranges.

The all-null ranges don't need to be processed during the non-NULL pass,
and we can terminate this pass early once we hit the first null-only
range. So placing them last helps with this.

The regular ranges are ordered by minval, as dictated by the algorithm
(which is now described in nodeBrinSort.c comment), but we also sort
them by blkno to make this a bit more sequential (but this only matters
for ranges with the same minval, and that's probably rare, but the extra
sort key is also cheap so why not).

I mentioned we do two separate passes - one for non-NULL values, one for
NULL values. That may be somewhat inefficient, because in extreme cases
we might end up scanning the whole table twice (imagine BRIN ranges
where each range has both regular values and NULLs). It might be
possible to do all of this in a single pass, at least in some cases -
for example while scanning ranges, we might stash NULL rows into a
tuplestore, so that the second pass is not needed. That assumes there
are not too many such rows (otherwise we might need to write and then
read many rows, outweighing the cost of just doing two passes). This
should be possible to estimate/cost fairly well, I think, and the
comment in nodeBrinSort actually presents some ideas about this.

And we can't do that for the NULLS FIRST case, because if we stash the
non-NULL rows somewhere, we won't be able to do the "incremental" sort,
i.e. we might just do regular Sort right away. So I've kept this simple
approach with two passes for now.

This still uses the approach with spilling tuples to a tuplestore, and
only sorting rows that we know are safe to output. I still think this is
a good approach, for the reasons I explained before, but changing this
is not hard so we can experiment.

There's however a related question - how quickly should we increment the
minval value, serving as a watermark? One option is to go to the next
distinct minval value - but that may result in excessive number of tiny
sorts, because the number ranges and rows between the old and new minval
values tends to be small. Another negative consequence is that this may
cause of lot of spilling (and re-spilling), because we only consume tiny
number of rows from the tuplestore after incrementing the watermark.

Or we can do larger steps, skipping some of the minval values, so that
more rows quality into the sort. Of course, too large step means we'll
exceed work_mem and switch to an on-disk sort, which we probably don't
want. Also, this may be the wrong thing to do for LIMIT queries, that
only need a couple rows, and a tiny sort is fine (because we won't do
too many of them).

Patch 0004 introduces a new GUC called brinsort_watermark_step, that can
be used to experiment with this. By default it's set to '1' which means
we simply progress to the next minval value.

Then 0005 tries to customize this based on statistics - we estimate the
number of rows we expect to get for each minval increment to "add" and
then pick just a step value not to overflow work_mem. This happens in
create_brinsort_plan, and the comment explains the main weakness - the
way the number of rows is estimated is somewhat naive, as it just
divides reltuples by number of ranges. But I have a couple ideas about
what statistics we might collect, explained in 0001 in the comment at
brin_minmax_stats.

But there's another option - we can tune the step based on past sorts.
If we see the sorts are doing on-disk sort, maybe try doing smaller
steps. Patch 0006 implements a very simple variant of this. There's a
couple ideas about how it might be improved, mentioned in the comment at
brinsort_adjust_watermark_step.

There's also patch 0003, which extends the EXPLAIN output with a
counters tracking the number of sorts, counts of on-disk/in-memory
sorts, space used, number of rows sorted/spilled, and so on. This is
useful when analyzing e.g. the effect of higher/lower watermark steps,
discussed in the preceding paragraphs.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Allow-index-AMs-to-build-and-use-custom-sta-20221022.patchtext/x-patch; charset=UTF-8; name=0001-Allow-index-AMs-to-build-and-use-custom-sta-20221022.patchDownload

From d8da87f72f367cc0364357ff4bda1dd02810215d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 17 Oct 2022 18:39:28 +0200
Subject: [PATCH 1/6] Allow index AMs to build and use custom statistics

Some indexing AMs work very differently and estimating them using
existing statistics is problematic, producing unreliable costing. This
applies e.g. to BRIN, which relies on page ranges, not tuple pointers.

This adds an optional AM procedure, allowing the opfamily to build
custom statistics, store them in pg_statistic and then use them during
planning. By default this is disabled, but may be enabled by setting

   SET enable_indexam_stats = true;

Then ANALYZE will call the optional procedure for all indexes.
---
 src/backend/access/brin/brin.c        |    1 +
 src/backend/access/brin/brin_minmax.c | 1332 +++++++++++++++++++++++++
 src/backend/commands/analyze.c        |  138 ++-
 src/backend/utils/adt/selfuncs.c      |   59 ++
 src/backend/utils/cache/lsyscache.c   |   41 +
 src/backend/utils/misc/guc_tables.c   |   10 +
 src/include/access/amapi.h            |    2 +
 src/include/access/brin.h             |   51 +
 src/include/access/brin_internal.h    |    1 +
 src/include/catalog/pg_amproc.dat     |   64 ++
 src/include/catalog/pg_proc.dat       |    4 +
 src/include/catalog/pg_statistic.h    |    5 +
 src/include/commands/vacuum.h         |    2 +
 src/include/utils/lsyscache.h         |    1 +
 14 files changed, 1706 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 6fabd14c263..3fe95cd717f 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -96,6 +96,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->amstrategies = 0;
 	amroutine->amsupport = BRIN_LAST_OPTIONAL_PROCNUM;
 	amroutine->amoptsprocnum = BRIN_PROCNUM_OPTIONS;
+	amroutine->amstatsprocnum = BRIN_PROCNUM_STATISTICS;
 	amroutine->amcanorder = false;
 	amroutine->amcanorderbyop = false;
 	amroutine->amcanbackward = false;
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index ead9e8f4e36..0135a00ae91 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -10,17 +10,22 @@
  */
 #include "postgres.h"
 
+#include "access/brin.h"
 #include "access/brin_internal.h"
+#include "access/brin_revmap.h"
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 typedef struct MinmaxOpaque
 {
@@ -31,6 +36,11 @@ typedef struct MinmaxOpaque
 static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
+/* print debugging into about calculated statistics */
+#define STATS_DEBUG
+
+/* calculate the stats in different ways for cross-checking */
+#define STATS_CROSS_CHECK
 
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
@@ -262,6 +272,1328 @@ brin_minmax_union(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/* FIXME copy of a private struct from brin.c */
+typedef struct BrinOpaque
+{
+	BlockNumber bo_pagesPerRange;
+	BrinRevmap *bo_rmAccess;
+	BrinDesc   *bo_bdesc;
+} BrinOpaque;
+
+/*
+ * Compare ranges by minval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_minval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->min_value, rb->min_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/*
+ * Compare ranges by maxval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_maxval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->max_value, rb->max_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/* compare values using an operator from typcache */
+static int
+range_values_cmp(const void *a, const void *b, void *arg)
+{
+	Datum	da = * (Datum *) a;
+	Datum	db = * (Datum *) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  da, db);
+	return DatumGetInt32(c);
+}
+
+/*
+ * maxval_start
+ *		Determine first index so that (maxvalue >= value).
+ *
+ * The array of ranges is expected to be sorted by maxvalue, so this is the first
+ * range that can possibly intersect with range having "value" as minval.
+ */
+static int
+maxval_start(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[start]->max_value, typcache) <= 0)
+		return 0;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[end]->max_value, typcache) > 0)
+		return nranges;
+
+	while ((end - start) > 0)
+	{
+		int	midpoint;
+		int	r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->max_value, typcache);
+
+		if (r <= 0)
+			end = midpoint;
+		else
+			start = (midpoint + 1);
+	}
+
+	Assert(ranges[start]->max_value >= value);
+	Assert(ranges[start-1]->max_value < value);
+
+	return start;
+}
+
+/*
+ * minval_end
+ *		Determine first index so that (minval > value).
+ *
+ * The array of ranges is expected to be sorted by minvalue, so this is the first
+ * range that can't possibly intersect with a range having "value" as maxval.
+ */
+static int
+minval_end(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[end]->min_value, typcache) >= 0)
+		return nranges;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[start]->min_value, typcache) < 0)
+		return 0;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->min_value, typcache);
+
+		if (r >= 0)
+			start = midpoint + 1;
+		else
+			end = midpoint;
+	}
+
+	Assert(ranges[start]->min_value > value);
+	Assert(ranges[start-1]->min_value <= value);
+
+	return start;
+}
+
+
+/*
+ * lower_bound
+ *		Determine first index so that (values[index] >= value).
+ *
+ * The array of ranges is expected to be sorted by maxvalue, so this is the first
+ * range that can possibly intersect with range having "value" as minval.
+ */
+static int
+lower_bound(Datum *values, int nvalues, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &values[start], typcache) <= 0)
+		return 0;
+
+	// no matches
+	if (range_values_cmp(&value, &values[end], typcache) > 0)
+		return nvalues;
+
+	while ((end - start) > 0)
+	{
+		int	midpoint;
+		int	r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &values[midpoint], typcache);
+
+		if (r <= 0)
+			end = midpoint;
+		else
+			start = (midpoint + 1);
+	}
+
+	Assert(values[start] >= value);
+	Assert(values[start-1] < value);
+
+	return start;
+}
+
+/*
+ * upper_bound
+ *		Determine first index so that (values[index] > value).
+ *
+ * The array of ranges is expected to be sorted by minvalue, so this is the first
+ * range that can't possibly intersect with a range having "value" as maxval.
+ */
+static int
+upper_bound(Datum *values, int nvalues, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &values[end], typcache) >= 0)
+		return nvalues;
+
+	// no matches
+	if (range_values_cmp(&value, &values[start], typcache) < 0)
+		return 0;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &values[midpoint], typcache);
+
+		if (r >= 0)
+			start = midpoint + 1;
+		else
+			end = midpoint;
+	}
+
+	Assert(values[start] > value);
+	Assert(values[start-1] <= value);
+
+	return start;
+}
+
+/*
+ * Simple histogram, with bins tracking value and two overlap counts.
+ *
+ * XXX Maybe we should have two separate histograms, one for all counts and
+ * another one for "unique" values.
+ *
+ * XXX Serialize the histogram. There might be a data set where we have very
+ * many distinct buckets (values having very different number of matching
+ * ranges) - not sure if there's some sort of upper limit (but hard to say for
+ * other opclasses, like bloom). And we don't want arbitrarily large histogram,
+ * to keep the statistics fairly small, I guess. So we'd need to pick a subset,
+ * merge buckets with "similar" counts, or approximate it somehow. For now we
+ * don't serialize it, because we don't use the histogram.
+ */
+typedef struct histogram_bin_t
+{
+	int		value;
+	int		count;
+} histogram_bin_t;
+
+typedef struct histogram_t
+{
+	int				nbins;
+	int				nbins_max;
+	histogram_bin_t	bins[FLEXIBLE_ARRAY_MEMBER];
+} histogram_t;
+
+#define HISTOGRAM_BINS_START 32
+
+/* allocate histogram with default number of bins */
+static histogram_t *
+histogram_init(void)
+{
+	histogram_t *hist;
+
+	hist = (histogram_t *) palloc0(offsetof(histogram_t, bins) +
+								   sizeof(histogram_bin_t) * HISTOGRAM_BINS_START);
+	hist->nbins_max = HISTOGRAM_BINS_START;
+
+	return hist;
+}
+
+/*
+ * histogram_add
+ *		Add a hit for a particular value to the histogram.
+ *
+ * XXX We don't sort the bins, so just do binary sort. For large number of values
+ * this might be an issue, for small number of values a linear search is fine.
+ */
+static histogram_t *
+histogram_add(histogram_t *hist, int value)
+{
+	bool	found = false;
+	histogram_bin_t *bin;
+
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		if (hist->bins[i].value == value)
+		{
+			bin = &hist->bins[i];
+			found = true;
+		}
+	}
+
+	if (!found)
+	{
+		if (hist->nbins == hist->nbins_max)
+		{
+			int		nbins = (2 * hist->nbins_max);
+			hist = repalloc(hist, offsetof(histogram_t, bins) +
+								   sizeof(histogram_bin_t) * nbins);
+			hist->nbins_max = nbins;
+		}
+
+		Assert(hist->nbins < hist->nbins_max);
+
+		bin = &hist->bins[hist->nbins++];
+		bin->value = value;
+		bin->count = 0;
+	}
+
+	bin->count += 1;
+
+	Assert(bin->value == value);
+	Assert(bin->count >= 0);
+
+	return hist;
+}
+
+/* used to sort histogram bins by value */
+static int
+histogram_bin_cmp(const void *a, const void *b)
+{
+	histogram_bin_t *ba = (histogram_bin_t *) a;
+	histogram_bin_t *bb = (histogram_bin_t *) b;
+
+	if (ba->value < bb->value)
+		return -1;
+
+	if (bb->value < ba->value)
+		return 1;
+
+	return 0;
+}
+
+static void
+histogram_print(histogram_t *hist)
+{
+	return;
+
+	elog(WARNING, "----- histogram -----");
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		elog(WARNING, "bin %d value %d count %d",
+			 i, hist->bins[i].value, hist->bins[i].count);
+	}
+}
+
+/*
+ * brin_minmax_count_overlaps
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges to quickly eliminate ranges that can't possibly
+ * intersect. We simply walk minranges until minval > current maxval, and
+ * we're done.
+ *
+ * Unlike brin_minmax_count_overlaps2, this does not have issues with wide
+ * ranges, so this is what we should use.
+ */
+static int
+brin_minmax_count_overlaps(BrinRange **minranges, int nranges, TypeCacheEntry *typcache)
+{
+	int noverlaps;
+
+#ifdef STATS_DEBUG
+	TimestampTz		start_ts = GetCurrentTimestamp();
+#endif
+
+	noverlaps = 0;
+	for (int i = 0; i < nranges; i++)
+	{
+		Datum	maxval = minranges[i]->max_value;
+
+		/*
+		 * Determine index of the first range with (minval > current maxval)
+		 * by binary search. We know all other ranges can't overlap the
+		 * current one. We simply subtract indexes to count ranges.
+		 */
+		int		idx = minval_end(minranges, nranges, maxval, typcache);
+
+		/* -1 because we don't count the range as intersecting with itself */
+		noverlaps += (idx - i - 1);
+	}
+
+	/*
+	 * We only count 1/2 the ranges (minval > current minval), so the total
+	 * number of overlaps is twice what we counted.
+	 */
+	noverlaps *= 2;
+
+#ifdef STATS_DEBUG
+	elog(WARNING, "----- brin_minmax_count_overlaps -----");
+	elog(WARNING, "noverlaps = %d", noverlaps);
+	elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+#endif
+
+	return noverlaps;
+}
+
+#ifdef STATS_CROSS_CHECK
+/*
+ * brin_minmax_count_overlaps2
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges/maxranges to quickly eliminate ranges that can't
+ * possibly intersect.
+ *
+ * XXX Seems rather complicated and works poorly for wide ranges (with outlier
+ * values), brin_minmax_count_overlaps is likely better.
+ */
+static int
+brin_minmax_count_overlaps2(BrinRanges *ranges,
+						   BrinRange **minranges, BrinRange **maxranges,
+						   TypeCacheEntry *typcache)
+{
+	int noverlaps;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Walk the ranges ordered by max_values, see how many ranges overlap.
+	 *
+	 * Once we get to a state where (min_value > current.max_value) for
+	 * all future ranges, we know none of them can overlap and we can
+	 * terminate. This is what min_index_lowest is for.
+	 *
+	 * XXX If there are very wide ranges (with outlier min/max values),
+	 * the min_index_lowest is going to be pretty useless, because the
+	 * range will be sorted at the very end by max_value, but will have
+	 * very low min_index, so this won't work.
+	 *
+	 * XXX We could collect a more elaborate stuff, like for example a
+	 * histogram of number of overlaps, or maximum number of overlaps.
+	 * So we'd have average, but then also an info if there are some
+	 * ranges with very many overlaps.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int			idx = i+1;
+		BrinRange *ra = maxranges[i];
+		uint64		min_index = ra->min_index;
+
+		CHECK_FOR_INTERRUPTS();
+
+#ifdef NOT_USED
+		/*
+		 * XXX Not needed, we can just count "future" ranges and then
+		 * we just multiply by 2.
+		 */
+
+		/*
+		 * What's the first range that might overlap with this one?
+		 * needs to have maxval > current.minval.
+		 */
+		while (idx > 0)
+		{
+			BrinRange *rb = maxranges[idx - 1];
+
+			/* the range is before the current one, so can't intersect */
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				break;
+
+			idx--;
+		}
+#endif
+
+		/*
+		 * Find the first min_index that is higher than the max_value,
+		 * so that we can compare that instead of the values in the
+		 * next loop. There should be fewer value comparisons than in
+		 * the next loop, so we'll save on function calls.
+		 */
+		while (min_index < ranges->nranges)
+		{
+			if (range_values_cmp(&minranges[min_index]->min_value,
+								 &ra->max_value, typcache) > 0)
+				break;
+
+			min_index++;
+		}
+
+		/*
+		 * Walk the following ranges (ordered by max_value), and check
+		 * if it overlaps. If it matches, we look at the next one. If
+		 * not, we check if there can be more ranges.
+		 */
+		for (int j = idx; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = maxranges[j];
+
+			/* the range overlaps - just continue with the next one */
+			// if (range_values_cmp(&rb->min_value, &ra->max_value, typcache) <= 0)
+			if (rb->min_index < min_index)
+			{
+				noverlaps++;
+				continue;
+			}
+
+			/*
+			 * Are there any future ranges that might overlap? We can
+			 * check the min_index_lowest to decide quickly.
+			 */
+			 if (rb->min_index_lowest >= min_index)
+					break;
+		}
+	}
+
+	/*
+	 * We only count intersect for "following" ranges when ordered by maxval,
+	 * so we only see 1/2 the overlaps. So double the result.
+	 */
+	noverlaps *= 2;
+
+	elog(WARNING, "----- brin_minmax_count_overlaps2 -----");
+	elog(WARNING, "noverlaps = %d", noverlaps);
+	elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+
+	return noverlaps;
+}
+
+/*
+ * brin_minmax_count_overlaps_bruteforce
+ *		Calculate number of overlaps by brute force.
+ *
+ * Actually compares every range to every other range. Quite expensive, used
+ * primarily to cross-check the other algorithms. 
+ */
+static int
+brin_minmax_count_overlaps_bruteforce(BrinRanges *ranges, TypeCacheEntry *typcache)
+{
+	int noverlaps;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Brute force calculation of overlapping ranges, comparing each
+	 * range to every other range - bound to be pretty expensive, as
+	 * it's pretty much O(N^2). Kept mostly for easy cross-check with
+	 * the preceding "optimized" code.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		BrinRange *ra = &ranges->ranges[i];
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = &ranges->ranges[j];
+
+			CHECK_FOR_INTERRUPTS();
+
+			if (i == j)
+				continue;
+
+			if (range_values_cmp(&ra->max_value, &rb->min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				continue;
+
+			elog(DEBUG1, "[%ld,%ld] overlaps [%ld,%ld]",
+				 ra->min_value, ra->max_value,
+				 rb->min_value, rb->max_value);
+
+			noverlaps++;
+		}
+	}
+
+	elog(WARNING, "----- brin_minmax_count_overlaps_bruteforce -----");
+	elog(WARNING, "noverlaps = %d", noverlaps);
+	elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+
+	return noverlaps;
+}
+#endif
+
+/*
+ * brin_minmax_match_tuples_to_ranges
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Alternative to brin_minmax_match_tuples_to_ranges2, leveraging ordering
+ * of values, not ranges.
+ *
+ * XXX This seems like the optimal way to do this.
+ */
+static void
+brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
+								   int numrows, HeapTuple *rows,
+								   int nvalues, Datum *values,
+								   TypeCacheEntry *typcache,
+								   int *res_nmatches,
+								   int *res_nmatches_unique,
+								   int *res_nvalues_unique)
+{
+	int		nmatches = 0;
+	int		nmatches_unique = 0;
+	int		nvalues_unique = 0;
+	int		nmatches_value = 0;
+
+	int	   *unique = (int *) palloc0(sizeof(int) * nvalues);
+
+#ifdef STATS_DEBUG
+	TimestampTz		start_ts = GetCurrentTimestamp();
+#endif
+
+	/*
+	 * Build running count of unique values. We know there are unique[i]
+	 * unique values in values array up to index "i".
+	 */
+	unique[0] = 1;
+	for (int i = 1; i < nvalues; i++)
+	{
+		if (range_values_cmp(&values[i-1], &values[i], typcache) == 0)
+			unique[i] = unique[i-1];
+		else
+			unique[i] = unique[i-1] + 1;
+	}
+
+	nvalues_unique = unique[nvalues-1];
+
+	/*
+	 * Walk the ranges, for each range determine the first/last mapping
+	 * value. Use the "unique" array to count the unique values.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int		start;
+		int		end;
+
+		CHECK_FOR_INTERRUPTS();
+
+		start = lower_bound(values, nvalues, ranges->ranges[i].min_value, typcache);
+		end = upper_bound(values, nvalues, ranges->ranges[i].max_value, typcache);
+
+		Assert(end > start);
+
+		nmatches_value = (end - start);
+		nmatches_unique += (unique[end-1] - unique[start] + 1);
+
+		nmatches += nmatches_value;
+	}
+
+#ifdef STATS_DEBUG
+	elog(WARNING, "----- brin_minmax_match_tuples_to_ranges -----");
+	elog(WARNING, "nmatches = %d %f", nmatches, (double) nmatches / numrows);
+	elog(WARNING, "nmatches unique = %d %d %f", nmatches_unique, nvalues_unique,
+		 (double) nmatches_unique / nvalues_unique);
+	elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+#endif
+
+	*res_nmatches = nmatches;
+	*res_nmatches_unique = nmatches_unique;
+	*res_nvalues_unique = nvalues_unique;
+}
+
+#ifdef STATS_CROSS_CHECK
+/*
+ * brin_minmax_match_tuples_to_ranges2
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Match sample tuples to the ranges, so that we can count how many ranges
+ * a value matches on average. This might seem redundant to the number of
+ * overlaps, because the value is ~avg_overlaps/2.
+ *
+ * Imagine ranges arranged in "shifted" uniformly by 1/overlaps, e.g. with 3
+ * overlaps [0,100], [33,133], [66, 166] and so on. A random value will hit
+ * only half of there ranges, thus 1/2. This can be extended to randomly
+ * overlapping ranges.
+ *
+ * However, we may not be able to count overlaps for some opclasses (e.g. for
+ * bloom ranges), in which case we have at least this.
+ *
+ * This simply walks the values, and determines matching ranges by looking
+ * for lower/upper bound in ranges ordered by minval/maxval.
+ *
+ * XXX The other question is what to do about duplicate values. If we have a
+ * very frequent value in the sample, it's likely in many places/ranges. Which
+ * will skew the average, because it'll be added repeatedly. So we also count
+ * avg_ranges for unique values.
+ *
+ * XXX The relationship that (average_matches ~ average_overlaps/2) only
+ * works for minmax opclass, and can't be extended to minmax-multi. The
+ * overlaps can only consider the two extreme values (essentially treating
+ * the summary as a single minmax range), because that's what brinsort
+ * needs. But the minmax-multi range may have "gaps" (kinda the whole point
+ * of these opclasses), which affects matching tuples to ranges.
+ *
+ * XXX This also builds histograms of the number of matches, both for the
+ * raw and unique values. At the moment we don't do anything with the
+ * results, though (except for printing those).
+ */
+static void
+brin_minmax_match_tuples_to_ranges2(BrinRanges *ranges,
+								    BrinRange **minranges, BrinRange **maxranges,
+								    int numrows, HeapTuple *rows,
+								    int nvalues, Datum *values,
+								    TypeCacheEntry *typcache,
+								    int *res_nmatches,
+								    int *res_nmatches_unique,
+								    int *res_nvalues_unique)
+{
+	int		nmatches = 0;
+	int		nmatches_unique = 0;
+	int		nvalues_unique = 0;
+	histogram_t *hist = histogram_init();
+	histogram_t *hist_unique = histogram_init();
+	int		nmatches_value = 0;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		int		start;
+		int		end;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Same value as preceding, so just use the preceding count.
+		 * We don't increment the unique counters, because this is
+		 * a duplicate.
+		 */
+		if ((i > 0) && (range_values_cmp(&values[i-1], &values[i], typcache) == 0))
+		{
+			nmatches += nmatches_value;
+			hist = histogram_add(hist, nmatches_value);
+			continue;
+		}
+
+		nmatches_value = 0;
+
+		start = maxval_start(maxranges, ranges->nranges, values[i], typcache);
+		end = minval_end(minranges, ranges->nranges, values[i], typcache);
+
+		for (int j = start; j < ranges->nranges; j++)
+		{
+			if (maxranges[j]->min_index >= end)
+				continue;
+
+			if (maxranges[j]->min_index_lowest >= end)
+				break;
+
+			nmatches_value++;
+		}
+
+		hist = histogram_add(hist, nmatches_value);
+		hist_unique = histogram_add(hist_unique, nmatches_value);
+
+		nmatches += nmatches_value;
+		nmatches_unique += nmatches_value;
+		nvalues_unique++;
+	}
+
+	elog(WARNING, "----- brin_minmax_match_tuples_to_ranges2 -----");
+	elog(WARNING, "nmatches = %d %f", nmatches, (double) nmatches / numrows);
+	elog(WARNING, "nmatches unique = %d %d %f",
+		 nmatches_unique, nvalues_unique, (double) nmatches_unique / nvalues_unique);
+	elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+
+	pg_qsort(hist->bins, hist->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+	pg_qsort(hist_unique->bins, hist_unique->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+
+	histogram_print(hist);
+	histogram_print(hist_unique);
+
+	pfree(hist);
+	pfree(hist_unique);
+
+	*res_nmatches = nmatches;
+	*res_nmatches_unique = nmatches_unique;
+	*res_nvalues_unique = nvalues_unique;
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges_bruteforce
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Bruteforce approach, used mostly for cross-checking.
+ */
+static void
+brin_minmax_match_tuples_to_ranges_bruteforce(BrinRanges *ranges,
+											  int numrows, HeapTuple *rows,
+											  int nvalues, Datum *values,
+											  TypeCacheEntry *typcache,
+											  int *res_nmatches,
+											  int *res_nmatches_unique,
+											  int *res_nvalues_unique)
+{
+	int nmatches = 0;
+	int nmatches_unique = 0;
+	int nvalues_unique = 0;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		bool	is_unique;
+		int		nmatches_value = 0;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* is this a new value? */
+		is_unique = ((i == 0) || (range_values_cmp(&values[i-1], &values[i], typcache) != 0));
+
+		/* count unique values */
+		nvalues_unique += (is_unique) ? 1 : 0;
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			if (range_values_cmp(&values[i], &ranges->ranges[j].min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&values[i], &ranges->ranges[j].max_value, typcache) > 0)
+				continue;
+
+			nmatches_value++;
+		}
+
+		nmatches += nmatches_value;
+		nmatches_unique += (is_unique) ? nmatches_value : 0;
+	}
+
+	elog(WARNING, "----- brin_minmax_match_tuples_to_ranges_bruteforce -----");
+	elog(WARNING, "nmatches = %d %f", nmatches, (double) nmatches / numrows);
+	elog(WARNING, "nmatches unique = %d %d %f", nmatches_unique, nvalues_unique,
+		 (double) nmatches_unique / nvalues_unique);
+	elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+
+	*res_nmatches = nmatches;
+	*res_nmatches_unique = nmatches_unique;
+	*res_nvalues_unique = nvalues_unique;
+}
+#endif
+
+/*
+ * brin_minmax_value_stats
+ *		Calculate statistics about minval/maxval values.
+ *
+ * We calculate the number of distinct values, and also correlation with respect
+ * to blkno_start. We don't calculate the regular correlation coefficient, because
+ * our goal is to estimate how sequential the accesses are. The regular correlation
+ * would produce 0 for cyclical data sets like mod(i,1000000), but it may be quite
+ * sequantial access. Maybe it should be called differently, not correlation?
+ *
+ * XXX Maybe this should calculate minval vs. maxval correlation too?
+ *
+ * XXX I don't know how important the sequentiality is - BRIN generally uses 1MB
+ * page ranges, which is pretty sequential and the one random seek in between is
+ * likely going to be negligible. Maybe for small page ranges it'll matter, though.
+ */
+static void
+brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
+						int nranges, TypeCacheEntry *typcache,
+						double *minval_correlation, int64 *minval_ndistinct,
+						double *maxval_correlation, int64 *maxval_ndistinct)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1,
+			minval_corr = 0,
+			maxval_corr = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+			minval_ndist++;
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+			maxval_ndist++;
+
+		/* is it immediately sequential? */
+		if (minranges[i-1]->blkno_end + 1 == minranges[i]->blkno_start)
+			minval_corr++;
+
+		/* is it immediately sequential? */
+		if (maxranges[i-1]->blkno_end + 1 == maxranges[i]->blkno_start)
+			maxval_corr++;
+	}
+
+	*minval_ndistinct = minval_ndist;
+	*maxval_ndistinct = maxval_ndist;
+
+	*minval_correlation = (double) minval_corr / nranges;
+	*maxval_correlation = (double) maxval_corr / nranges;
+
+#ifdef STATS_DEBUG
+	elog(WARNING, "----- brin_minmax_value_stats -----");
+	elog(WARNING, "minval ndistinct %ld correlation %f",
+		 *minval_ndistinct, *minval_correlation);
+
+	elog(WARNING, "maxval ndistinct %ld correlation %f",
+		 *maxval_ndistinct, *maxval_correlation);
+#endif
+}
+
+/*
+ * brin_minmax_stats
+ *		Calculate custom statistics for a BRIN minmax index.
+ *
+ * At the moment this calculates:
+ *
+ *  - number of summarized/not-summarized and all/has nulls ranges
+ *  - average number of overlaps for a range
+ *  - average number of rows matching a range
+ *  - number of distinct minval/maxval values
+ *
+ * There are multiple ways to calculate some of the metrics, so to allow
+ * cross-checking during development it's possible to run and compare all.
+ * To do that, define STATS_CROSS_CHECK. There's also STATS_DEBUG define
+ * that simply prints the calculated results.
+ *
+ * XXX This could also calculate correlation of the range minval, so that
+ * we can estimate how much random I/O will happen during the BrinSort.
+ * And perhaps we should also sort the ranges by (minval,block_start) to
+ * make this as sequential as possible?
+ *
+ * XXX Another interesting statistics might be the number of ranges with
+ * the same minval (or number of distinct minval values), because that's
+ * essentially what we need to estimate how many ranges will be read in
+ * one brinsort step. In fact, knowing the number of distinct minval
+ * values tells us the number of BrinSort loops.
+ *
+ * XXX We might also calculate a histogram of minval/maxval values.
+ *
+ * XXX I wonder if we could track for each range track probabilities:
+ *
+ * - P1 = P(v <= minval)
+ * - P2 = P(x <= Max(maxval)) for Max(maxval) over preceding ranges
+ *
+ * That would allow us to estimate how many ranges we'll have to read to produce
+ * a particular number of rows, because we need the first probability to exceed
+ * the requested number of rows (fraction of the table):
+ *
+ *     (limit rows / reltuples) <= P(v <= minval)
+ *
+ * and then the second probability would say how many rows we'll process (either
+ * sort or spill). And inversely for the DESC ordering.
+ *
+ * The difference between P1 for two ranges is how much we'd have to sort
+ * if we moved the watermark between the ranges (first minval to second one).
+ * The (P2 - P1) for the new watermark range measures the number of rows in
+ * the tuplestore. We'll need to aggregate this, though, we can't keep the
+ * whole data - probably average/median/max for the differences would be nice.
+ * Might be tricky for different watermark step values, though.
+ *
+ * This would also allow estimating how many rows will spill from each range,
+ * because we have an estimate how many rows match a range on average, and
+ * we can compare it to the difference between P1.
+ *
+ * One issue is we don't have actual tuples from the ranges, so we can't
+ * measure exactly how many rows would we add. But we can match the sample
+ * and at least estimate the the probability difference.
+ */
+Datum
+brin_minmax_stats(PG_FUNCTION_ARGS)
+{
+	Relation		heapRel = (Relation) PG_GETARG_POINTER(0);
+	Relation		indexRel = (Relation) PG_GETARG_POINTER(1);
+	AttrNumber		attnum = PG_GETARG_INT16(2);
+	AttrNumber		heap_attnum = PG_GETARG_INT16(3);
+	HeapTuple	   *rows = (HeapTuple *) PG_GETARG_POINTER(4);
+	int				numrows = PG_GETARG_INT32(5);
+
+	BrinOpaque *opaque;
+	BlockNumber nblocks;
+	BlockNumber	nranges;
+	BlockNumber	heapBlk;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BrinRanges  *ranges;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinMinmaxStats *stats;
+
+	Oid				typoid;
+	TypeCacheEntry *typcache;
+	BrinRange	  **minranges,
+				  **maxranges;
+	int64			noverlaps;
+	int64			prev_min_index;
+
+	/*
+	 * Mostly what brinbeginscan does to initialize BrinOpaque, except that
+	 * we use active snapshot instead of the scan snapshot.
+	 */
+	opaque = palloc_object(BrinOpaque);
+	opaque->bo_rmAccess = brinRevmapInitialize(indexRel,
+											   &opaque->bo_pagesPerRange,
+											   GetActiveSnapshot());
+	opaque->bo_bdesc = brin_build_desc(indexRel);
+
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+
+	/*
+	 * How many ranges can there be? We simply look at the number of pages,
+	 * divide it by the pages_per_range.
+	 *
+	 * XXX We need to be careful not to overflow nranges, so we just divide
+	 * and then maybe add 1 for partial ranges.
+	 */
+	nranges = (nblocks / pagesPerRange);
+	if (nblocks % pagesPerRange != 0)
+		nranges += 1;
+
+	/* allocate for space, and also for the alternative ordering */
+	ranges = palloc0(offsetof(BrinRanges, ranges) + nranges * sizeof(BrinRange));
+	ranges->nranges = 0;
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* result stats */
+	stats = palloc0(sizeof(BrinMinmaxStats));
+	SET_VARSIZE(stats, sizeof(BrinMinmaxStats));
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX We count the ranges, and count the special types (not summarized,
+	 * all-null and has-null). The regular ranges are accumulated into an
+	 * array, so that we can calculate additional statistics (overlaps, hits
+	 * for sample tuples, etc).
+	 *
+	 * XXX This needs rethinking to make it work with large indexes with more
+	 * ranges than we can fit into memory (work_mem/maintenance_work_mem).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		stats->n_ranges++;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   GetActiveSnapshot());
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/* Ranges with no indexed tuple are ignored for overlap analysis. */
+		if (!gottuple)
+		{
+			continue;
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/* Placeholders can be ignored too, as if not summarized. */
+				continue;
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				/* OK this range is summarized */
+				stats->n_summarized++;
+
+				if (bval->bv_allnulls)
+					stats->n_all_nulls++;
+
+				if (bval->bv_hasnulls)
+					stats->n_has_nulls++;
+
+				if (!bval->bv_allnulls)
+				{
+					BrinRange  *range;
+
+					range = &ranges->ranges[ranges->nranges++];
+
+					range->blkno_start = heapBlk;
+					range->blkno_end = heapBlk + (pagesPerRange - 1);
+
+					range->min_value = bval->bv_values[0];
+					range->max_value = bval->bv_values[1];
+				}
+			}
+		}
+	}
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	elog(WARNING, "extracted ranges %d from BRIN index", ranges->nranges);
+
+	/* if we have no regular ranges, we're done */
+	if (ranges->nranges == 0)
+		goto cleanup;
+
+	/*
+	 * Build auxiliary info to optimize the calculation.
+	 *
+	 * We have ranges in the blocknum order, but that is not very useful when
+	 * calculating which ranges interstect - we could cross-check every range
+	 * against every other range, but that's O(N^2) and thus may get extremely
+	 * expensive pretty quick).
+	 *
+	 * To make that cheaper, we'll build two orderings, allowing us to quickly
+	 * eliminate ranges that can't possibly overlap:
+	 *
+	 * - minranges = ranges ordered by min_value
+	 * - maxranges = ranges ordered by max_value
+	 *
+	 * To count intersections, we'll then walk maxranges (i.e. ranges ordered
+	 * by maxval), and for each following range we'll check if it overlaps.
+	 * If yes, we'll proceed to the next one, until we find a range that does
+	 * not overlap. But there might be a later page overlapping - but we can
+	 * use a min_index_lowest tracking the minimum min_index for "future"
+	 * ranges to quickly decide if there are such ranges. If there are none,
+	 * we can terminate (and proceed to the next maxranges element), else we
+	 * have to process additional ranges.
+	 *
+	 * Note: This only counts overlaps with ranges with max_value higher than
+	 * the current one - we want to count all, but the overlaps with preceding
+	 * ranges have already been counted when processing those preceding ranges.
+	 * That is, we'll end up with counting each overlap just for one of those
+	 * ranges, so we get only 1/2 the count.
+	 *
+	 * Note: We don't count the range as overlapping with itself. This needs
+	 * to be considered later, when applying the statistics.
+	 *
+	 *
+	 * XXX This will not work for very many ranges - we can have up to 2^32 of
+	 * them, so allocating a ~32B struct for each would need a lot of memory.
+	 * Not sure what to do about that, perhaps we could sample a couple ranges
+	 * and do some calculations based on that? That is, we could process all
+	 * ranges up to some number (say, statistics_target * 300, as for rows), and
+	 * then sample ranges for larger tables. Then sort the sampled ranges, and
+	 * walk through all ranges once, comparing them to the sample and counting
+	 * overlaps (having them sorted should allow making this quite efficient,
+	 * I think - following algorithm similar to the one implemented here).
+	 */
+
+	/* info about ordering for the data type */
+	typoid = get_atttype(RelationGetRelid(indexRel), attnum);
+	typcache = lookup_type_cache(typoid, TYPECACHE_CMP_PROC_FINFO);
+
+	/* shouldn't happen, I think - we use this to build the index */
+	Assert(OidIsValid(typcache->cmp_proc_finfo.fn_oid));
+
+	minranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+	maxranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+
+	/*
+	 * Build and sort the ranges min_value / max_value (just pointers
+	 * to the main array). Then go and assign the min_index to each
+	 * range, and finally walk the maxranges array backwards and track
+	 * the min_index_lowest as minimum of "future" indexes.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		minranges[i] = &ranges->ranges[i];
+		maxranges[i] = &ranges->ranges[i];
+	}
+
+	qsort_arg(minranges, ranges->nranges, sizeof(BrinRange *),
+			  range_minval_cmp, typcache);
+
+	qsort_arg(maxranges, ranges->nranges, sizeof(BrinRange *),
+			  range_maxval_cmp, typcache);
+
+	/*
+	 * Update the min_index for each range. If the values are equal, be sure to
+	 * pick the lowest index with that min_value.
+	 */
+	minranges[0]->min_index = 0;
+	for (int i = 1; i < ranges->nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i]->min_value, &minranges[i-1]->min_value, typcache) == 0)
+			minranges[i]->min_index = minranges[i-1]->min_index;
+		else
+			minranges[i]->min_index = i;
+	}
+
+	/*
+	 * Walk the maxranges backward and assign the min_index_lowest as
+	 * a running minimum.
+	 */
+	prev_min_index = ranges->nranges;
+	for (int i = (ranges->nranges - 1); i >= 0; i--)
+	{
+		maxranges[i]->min_index_lowest = Min(maxranges[i]->min_index,
+											 prev_min_index);
+		prev_min_index = maxranges[i]->min_index_lowest;
+	}
+
+	/* calculate average number of overlapping ranges for any range */
+	noverlaps = brin_minmax_count_overlaps(minranges, ranges->nranges, typcache);
+
+	stats->avg_overlaps = (double) noverlaps / ranges->nranges;
+
+#ifdef STATS_CROSS_CHECK
+	brin_minmax_count_overlaps2(ranges, minranges, maxranges, typcache);
+	brin_minmax_count_overlaps_bruteforce(ranges, typcache);
+#endif
+
+	/* calculate minval/maxval stats (distinct values and correlation) */
+	brin_minmax_value_stats(minranges, maxranges,
+							ranges->nranges, typcache,
+							&stats->minval_correlation,
+							&stats->minval_ndistinct,
+							&stats->maxval_correlation,
+							&stats->maxval_ndistinct);
+
+	/* match tuples to ranges */
+	{
+		int		nvalues = 0;
+		int		nmatches,
+				nmatches_unique,
+				nvalues_unique;
+
+		Datum  *values = (Datum *) palloc0(numrows * sizeof(Datum));
+
+		TupleDesc	tdesc = RelationGetDescr(heapRel);
+
+		for (int i = 0; i < numrows; i++)
+		{
+			bool	isnull;
+			Datum	value;
+
+			value = heap_getattr(rows[i], heap_attnum, tdesc, &isnull);
+			if (!isnull)
+				values[nvalues++] = value;
+		}
+
+		qsort_arg(values, nvalues, sizeof(Datum), range_values_cmp, typcache);
+
+		/* optimized algorithm */
+		brin_minmax_match_tuples_to_ranges(ranges,
+										   numrows, rows, nvalues, values,
+										   typcache,
+										   &nmatches,
+										   &nmatches_unique,
+										   &nvalues_unique);
+
+		stats->avg_matches = (double) nmatches / numrows;
+		stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
+
+#ifdef STATS_CROSS_CHECK
+		brin_minmax_match_tuples_to_ranges2(ranges, minranges, maxranges,
+										    numrows, rows, nvalues, values,
+										    typcache,
+										    &nmatches,
+										    &nmatches_unique,
+										    &nvalues_unique);
+
+		brin_minmax_match_tuples_to_ranges_bruteforce(ranges,
+													  numrows, rows,
+													  nvalues, values,
+													  typcache,
+													  &nmatches,
+													  &nmatches_unique,
+													  &nvalues_unique);
+#endif
+	}
+
+	/*
+	 * Possibly quite large, so release explicitly and don't rely
+	 * on the memory context to discard this.
+	 */
+	pfree(minranges);
+	pfree(maxranges);
+
+cleanup:
+	/* possibly quite large, so release explicitly */
+	pfree(ranges);
+
+	/* free the BrinOpaque, just like brinendscan() would */
+	brinRevmapTerminate(opaque->bo_rmAccess);
+	brin_free_desc(opaque->bo_bdesc);
+
+	PG_RETURN_POINTER(stats);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index ff1354812bd..b7435194dc0 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -16,6 +16,7 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
 #include "access/detoast.h"
 #include "access/genam.h"
 #include "access/multixact.h"
@@ -30,6 +31,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_namespace.h"
@@ -81,6 +83,7 @@ typedef struct AnlIndexData
 
 /* Default statistics target (GUC parameter) */
 int			default_statistics_target = 100;
+bool		enable_indexam_stats = false;
 
 /* A few variables that don't seem worth passing around as parameters */
 static MemoryContext anl_context = NULL;
@@ -92,7 +95,7 @@ static void do_analyze_rel(Relation onerel,
 						   AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
 						   bool inh, bool in_outer_xact, int elevel);
 static void compute_index_stats(Relation onerel, double totalrows,
-								AnlIndexData *indexdata, int nindexes,
+								AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 								HeapTuple *rows, int numrows,
 								MemoryContext col_context);
 static VacAttrStats *examine_attribute(Relation onerel, int attnum,
@@ -454,15 +457,49 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		{
 			AnlIndexData *thisdata = &indexdata[ind];
 			IndexInfo  *indexInfo;
+			bool		collectAmStats;
+			Oid			regproc;
 
 			thisdata->indexInfo = indexInfo = BuildIndexInfo(Irel[ind]);
 			thisdata->tupleFract = 1.0; /* fix later if partial */
-			if (indexInfo->ii_Expressions != NIL && va_cols == NIL)
+
+			/*
+			 * Should we collect AM-specific statistics for any of the columns?
+			 *
+			 * If AM-specific statistics are enabled (using a GUC), see if we
+			 * have an optional support procedure to build the statistics.
+			 *
+			 * If there's any such attribute, we just force building stats
+			 * even for regular index keys (not just expressions) and indexes
+			 * without predicates. It'd be good to only build the AM stats, but
+			 * for now this is good enough.
+			 *
+			 * XXX The GUC is there morestly to make it easier to enable/disable
+			 * this during development.
+			 *
+			 * FIXME Only build the AM statistics, not the other stats. And only
+			 * do that for the keys with the optional procedure. not all of them.
+			 */
+			collectAmStats = false;
+			if (enable_indexam_stats && (Irel[ind]->rd_indam->amstatsprocnum != 0))
+			{
+				for (int j = 0; j < indexInfo->ii_NumIndexAttrs; j++)
+				{
+					regproc = index_getprocid(Irel[ind], (j+1), Irel[ind]->rd_indam->amstatsprocnum);
+					if (OidIsValid(regproc))
+					{
+						collectAmStats = true;
+						break;
+					}
+				}
+			}
+
+			if ((indexInfo->ii_Expressions != NIL || collectAmStats) && va_cols == NIL)
 			{
 				ListCell   *indexpr_item = list_head(indexInfo->ii_Expressions);
 
 				thisdata->vacattrstats = (VacAttrStats **)
-					palloc(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
+					palloc0(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
 				tcnt = 0;
 				for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
 				{
@@ -483,6 +520,12 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 						if (thisdata->vacattrstats[tcnt] != NULL)
 							tcnt++;
 					}
+					else
+					{
+						thisdata->vacattrstats[tcnt] =
+							examine_attribute(Irel[ind], i + 1, NULL);
+						tcnt++;
+					}
 				}
 				thisdata->attr_cnt = tcnt;
 			}
@@ -588,7 +631,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 		if (nindexes > 0)
 			compute_index_stats(onerel, totalrows,
-								indexdata, nindexes,
+								indexdata, Irel, nindexes,
 								rows, numrows,
 								col_context);
 
@@ -822,12 +865,82 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	anl_context = NULL;
 }
 
+/*
+ * compute_indexam_stats
+ *		Call the optional procedure to compute AM-specific statistics.
+ *
+ * We simply call the procedure, which is expected to produce a bytea value.
+ *
+ * At the moment this only deals with BRIN indexes, and bails out for other
+ * access methods, but it should be generic - use something like amoptsprocnum
+ * and just check if the procedure exists.
+ */
+static void
+compute_indexam_stats(Relation onerel,
+					  Relation indexRel, IndexInfo *indexInfo,
+					  double totalrows, AnlIndexData *indexdata,
+					  HeapTuple *rows, int numrows)
+{
+	if (!enable_indexam_stats)
+		return;
+
+	/* ignore index AMs without the optional procedure */
+	if (indexRel->rd_indam->amstatsprocnum == 0)
+		return;
+
+	/*
+	 * Look at attributes, and calculate stats for those that have the
+	 * optional stats proc for the opfamily.
+	 */
+	for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+	{
+		AttrNumber		attno = (i + 1);
+		AttrNumber		attnum = indexInfo->ii_IndexAttrNumbers[i];	/* heap attnum */
+		RegProcedure	regproc;
+		FmgrInfo	   *statsproc;
+		Datum			datum;
+		VacAttrStats   *stats;
+		MemoryContext	oldcxt;
+
+		/* do this first, as it doesn't fail when proc not defined */
+		regproc = index_getprocid(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+
+		/* ignore opclasses without the optional procedure */
+		if (!RegProcedureIsValid(regproc))
+			continue;
+
+		statsproc = index_getprocinfo(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+
+		stats = indexdata->vacattrstats[i];
+
+		if (statsproc != NULL)
+			elog(WARNING, "collecting stats on BRIN ranges %p using proc %p attnum %d",
+				 indexRel, statsproc, attno);
+
+		oldcxt = MemoryContextSwitchTo(stats->anl_context);
+
+		/* call the proc, let the AM calculate whatever it wants */
+		datum = FunctionCall6Coll(statsproc,
+								  InvalidOid, /* FIXME correct collation */
+								  PointerGetDatum(onerel),
+								  PointerGetDatum(indexRel),
+								  Int16GetDatum(attno),
+								  Int16GetDatum(attnum),
+								  PointerGetDatum(rows),
+								  Int32GetDatum(numrows));
+
+		stats->staindexam = datum;
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+}
+
 /*
  * Compute statistics about indexes of a relation
  */
 static void
 compute_index_stats(Relation onerel, double totalrows,
-					AnlIndexData *indexdata, int nindexes,
+					AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 					HeapTuple *rows, int numrows,
 					MemoryContext col_context)
 {
@@ -847,6 +960,7 @@ compute_index_stats(Relation onerel, double totalrows,
 	{
 		AnlIndexData *thisdata = &indexdata[ind];
 		IndexInfo  *indexInfo = thisdata->indexInfo;
+		Relation	indexRel = indexRels[ind];
 		int			attr_cnt = thisdata->attr_cnt;
 		TupleTableSlot *slot;
 		EState	   *estate;
@@ -859,6 +973,13 @@ compute_index_stats(Relation onerel, double totalrows,
 					rowno;
 		double		totalindexrows;
 
+		/*
+		 * If this is a BRIN index, try calling a procedure to collect
+		 * extra opfamily-specific statistics (if procedure defined).
+		 */
+		compute_indexam_stats(onerel, indexRel, indexInfo, totalrows,
+							  thisdata, rows, numrows);
+
 		/* Ignore index if no columns to analyze and not partial */
 		if (attr_cnt == 0 && indexInfo->ii_Predicate == NIL)
 			continue;
@@ -1661,6 +1782,13 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+
+		/* optional AM-specific stats */
+		if (DatumGetPointer(stats->staindexam) != NULL)
+			values[Anum_pg_statistic_staindexam - 1] = stats->staindexam;
+		else
+			nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 69e0fb98f5b..9f640adb13c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7715,6 +7715,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	Relation	indexRel;
 	ListCell   *l;
 	VariableStatData vardata;
+	double		averageOverlaps;
 
 	Assert(rte->rtekind == RTE_RELATION);
 
@@ -7762,6 +7763,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 * correlation statistics, we will keep it as 0.
 	 */
 	*indexCorrelation = 0;
+	averageOverlaps = 0.0;
 
 	foreach(l, path->indexclauses)
 	{
@@ -7771,6 +7773,36 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		/* attempt to lookup stats in relation for this index column */
 		if (attnum != 0)
 		{
+			/*
+			 * If AM-specific statistics are enabled, try looking up the stats
+			 * for the index key. We only have this for minmax opclasses, so
+			 * we just cast it like that. But other BRIN opclasses might need
+			 * other stats so either we need to abstract this somehow, or maybe
+			 * just collect a sufficiently generic stats for all BRIN indexes.
+			 *
+			 * XXX Make this non-minmax specific.
+			 */
+			if (enable_indexam_stats)
+			{
+				BrinMinmaxStats  *amstats
+					= (BrinMinmaxStats *) get_attindexam(index->indexoid, attnum);
+
+				if (amstats)
+				{
+					elog(DEBUG1, "found AM stats: attnum %d n_ranges %ld n_summarized %ld n_all_nulls %ld n_has_nulls %ld avg_overlaps %f",
+						 attnum, amstats->n_ranges, amstats->n_summarized,
+						 amstats->n_all_nulls, amstats->n_has_nulls,
+						 amstats->avg_overlaps);
+
+					/*
+					 * The only thing we use at the moment is the average number
+					 * of overlaps for a single range. Use the other stuff too.
+					 */
+					averageOverlaps = Max(averageOverlaps,
+										  1.0 + amstats->avg_overlaps);
+				}
+			}
+
 			/* Simple variable -- look to stats for the underlying table */
 			if (get_relation_stats_hook &&
 				(*get_relation_stats_hook) (root, rte, attnum, &vardata))
@@ -7851,6 +7883,14 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 											 baserel->relid,
 											 JOIN_INNER, NULL);
 
+	/*
+	 * XXX Can we combine qualSelectivity with the average number of matching
+	 * ranges per value? qualSelectivity estimates how many tuples ar we
+	 * going to match, and average number of matches says how many ranges
+	 * will each of those match on average. We don't know how many will
+	 * be duplicate, but it gives us a worst-case estimate, at least.
+	 */
+
 	/*
 	 * Now calculate the minimum possible ranges we could match with if all of
 	 * the rows were in the perfect order in the table's heap.
@@ -7867,6 +7907,25 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	else
 		estimatedRanges = Min(minimalRanges / *indexCorrelation, indexRanges);
 
+	elog(DEBUG1, "before index AM stats: cestimatedRanges = %f", estimatedRanges);
+
+	/*
+	 * If we found some AM stats, look at average number of overlapping ranges,
+	 * and apply that to the currently estimated ranges.
+	 *
+	 * XXX We pretty much combine this with correlation info (because it was
+	 * already applied in the estimatedRanges formula above), which might be
+	 * overly pessimistic. The overlaps stats seems somewhat redundant with
+	 * the correlation, so maybe we should do just one? The AM stats seems
+	 * like a more reliable information, because the correlation is not very
+	 * sensitive to outliers, for example. So maybe let's prefer that, and
+	 * only use the correlation as fallback when AM stats are not available?
+	 */
+	if (averageOverlaps > 0.0)
+		estimatedRanges = Min(estimatedRanges * averageOverlaps, indexRanges);
+
+	elog(DEBUG1, "after index AM stats: cestimatedRanges = %f", estimatedRanges);
+
 	/* we expect to visit this portion of the table */
 	selec = estimatedRanges / indexRanges;
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index a16a63f4957..1725f5af347 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -3138,6 +3138,47 @@ get_attavgwidth(Oid relid, AttrNumber attnum)
 	return 0;
 }
 
+
+/*
+ * get_attstaindexam
+ *
+ *	  Given the table and attribute number of a column, get the index AM
+ *	  statistics.  Return NULL if no data available.
+ *
+ * Currently this is only consulted for individual tables, not for inheritance
+ * trees, so we don't need an "inh" parameter.
+ */
+bytea *
+get_attindexam(Oid relid, AttrNumber attnum)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache3(STATRELATTINH,
+						 ObjectIdGetDatum(relid),
+						 Int16GetDatum(attnum),
+						 BoolGetDatum(false));
+	if (HeapTupleIsValid(tp))
+	{
+		Datum	val;
+		bytea  *retval = NULL;
+		bool	isnull;
+
+		val = SysCacheGetAttr(STATRELATTINH, tp,
+							  Anum_pg_statistic_staindexam,
+							  &isnull);
+
+		if (!isnull)
+			retval = (bytea *) PG_DETOAST_DATUM(val);
+
+		// staindexam = ((Form_pg_statistic) GETSTRUCT(tp))->staindexam;
+		ReleaseSysCache(tp);
+
+		return retval;
+	}
+
+	return NULL;
+}
+
 /*
  * get_attstatsslot
  *
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 05ab087934c..06dfeb6cd8b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -967,6 +967,16 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexam_stats", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index AM stats."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_indexam_stats,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 1dc674d2305..8437c2f0e71 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -216,6 +216,8 @@ typedef struct IndexAmRoutine
 	uint16		amsupport;
 	/* opclass options support function number or 0 */
 	uint16		amoptsprocnum;
+	/* opclass statistics support function number or 0 */
+	uint16		amstatsprocnum;
 	/* does AM support ORDER BY indexed column's value? */
 	bool		amcanorder;
 	/* does AM support ORDER BY result of an operator on indexed column? */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index 887fb0a5532..a7cccac9c90 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,6 +34,57 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinMinmaxStats
+{
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	int64		n_ranges;
+	int64		n_summarized;
+	int64		n_all_nulls;
+	int64		n_has_nulls;
+	double		avg_overlaps;
+	double		avg_matches;
+	double		avg_matches_unique;
+
+	double		minval_correlation;
+	double		maxval_correlation;
+	int64		minval_ndistinct;
+	int64		maxval_ndistinct;
+} BrinMinmaxStats;
 
 #define BRIN_DEFAULT_PAGES_PER_RANGE	128
 #define BrinGetPagesPerRange(relation) \
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index 25186609272..ee6c6f9b709 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -73,6 +73,7 @@ typedef struct BrinDesc
 #define BRIN_PROCNUM_UNION			4
 #define BRIN_MANDATORY_NPROCS		4
 #define BRIN_PROCNUM_OPTIONS 		5	/* optional */
+#define BRIN_PROCNUM_STATISTICS		6	/* optional */
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 4cc129bebd8..ea3de9bcba1 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -804,6 +804,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -835,6 +837,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -864,6 +868,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -893,6 +899,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -905,6 +913,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -917,6 +927,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1034,6 +1046,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1062,6 +1076,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1110,6 +1126,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1160,6 +1178,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1173,6 +1194,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1261,6 +1285,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1314,6 +1341,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1366,6 +1396,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1436,6 +1468,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1467,6 +1502,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1517,6 +1554,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1530,6 +1570,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1542,6 +1585,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1668,6 +1713,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1721,6 +1769,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1771,6 +1822,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1785,6 +1838,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1799,6 +1855,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1851,6 +1910,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1924,6 +1985,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 62a5b8e655d..1dd9177b01c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8407,6 +8407,10 @@
 { oid => '3386', descr => 'BRIN minmax support',
   proname => 'brin_minmax_union', prorettype => 'bool',
   proargtypes => 'internal internal internal', prosrc => 'brin_minmax_union' },
+{ oid => '9979', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_stats', prorettype => 'bool',
+  proargtypes => 'internal internal int2 int2 internal int4',
+  prosrc => 'brin_minmax_stats' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index cdf74481398..7043b169f7c 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -121,6 +121,11 @@ CATALOG(pg_statistic,2619,StatisticRelationId)
 	anyarray	stavalues3;
 	anyarray	stavalues4;
 	anyarray	stavalues5;
+
+	/*
+	 * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).
+	 */
+	bytea		staindexam;
 #endif
 } FormData_pg_statistic;
 
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 5d816ba7f4e..319f7d4aadc 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -155,6 +155,7 @@ typedef struct VacAttrStats
 	float4	   *stanumbers[STATISTIC_NUM_SLOTS];
 	int			numvalues[STATISTIC_NUM_SLOTS];
 	Datum	   *stavalues[STATISTIC_NUM_SLOTS];
+	Datum		staindexam;		/* index-specific stats (as bytea) */
 
 	/*
 	 * These fields describe the stavalues[n] element types. They will be
@@ -258,6 +259,7 @@ extern PGDLLIMPORT int vacuum_multixact_freeze_min_age;
 extern PGDLLIMPORT int vacuum_multixact_freeze_table_age;
 extern PGDLLIMPORT int vacuum_failsafe_age;
 extern PGDLLIMPORT int vacuum_multixact_failsafe_age;
+extern PGDLLIMPORT bool enable_indexam_stats;
 
 /* Variables for cost-based parallel vacuum */
 extern PGDLLIMPORT pg_atomic_uint32 *VacuumSharedCostBalance;
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 50f02883052..71ce5b15d74 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -185,6 +185,7 @@ extern Oid	getBaseType(Oid typid);
 extern Oid	getBaseTypeAndTypmod(Oid typid, int32 *typmod);
 extern int32 get_typavgwidth(Oid typid, int32 typmod);
 extern int32 get_attavgwidth(Oid relid, AttrNumber attnum);
+extern bytea *get_attindexam(Oid relid, AttrNumber attnum);
 extern bool get_attstatsslot(AttStatsSlot *sslot, HeapTuple statstuple,
 							 int reqkind, Oid reqop, int flags);
 extern void free_attstatsslot(AttStatsSlot *sslot);
-- 
2.37.3

0002-Allow-BRIN-indexes-to-produce-sorted-output-20221022.patchtext/x-patch; charset=UTF-8; name=0002-Allow-BRIN-indexes-to-produce-sorted-output-20221022.patchDownload

From 09c7127124c44373c006126adb7391b5fc3e3475 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 9 Oct 2022 11:33:37 +0200
Subject: [PATCH 2/6] Allow BRIN indexes to produce sorted output

Some BRIN indexes can be used to produce sorted output, by using the
range information to sort tuples incrementally. This is particularly
interesting for LIMIT queries, which only need to scan the first few
rows, and alternative plans (e.g. Seq Scan + Sort) have a very high
startup cost.

Of course, if there are e.g. BTREE indexes this is going to be slower,
but people are unlikely to have both index types on the same column.

This is disabled by default, use enable_brinsort GUC to enable it.
---
 src/backend/access/brin/brin_minmax.c   |  386 ++++++
 src/backend/commands/explain.c          |   44 +
 src/backend/executor/Makefile           |    1 +
 src/backend/executor/execProcnode.c     |   10 +
 src/backend/executor/nodeBrinSort.c     | 1550 +++++++++++++++++++++++
 src/backend/optimizer/path/costsize.c   |  254 ++++
 src/backend/optimizer/path/indxpath.c   |  186 +++
 src/backend/optimizer/path/pathkeys.c   |   50 +
 src/backend/optimizer/plan/createplan.c |  188 +++
 src/backend/optimizer/plan/setrefs.c    |   19 +
 src/backend/optimizer/util/pathnode.c   |   57 +
 src/backend/utils/misc/guc_tables.c     |   10 +
 src/include/access/brin.h               |   35 -
 src/include/access/brin_internal.h      |    1 +
 src/include/catalog/pg_amproc.dat       |   64 +
 src/include/catalog/pg_proc.dat         |    3 +
 src/include/executor/nodeBrinSort.h     |   47 +
 src/include/nodes/execnodes.h           |  103 ++
 src/include/nodes/pathnodes.h           |   11 +
 src/include/nodes/plannodes.h           |   26 +
 src/include/optimizer/cost.h            |    3 +
 src/include/optimizer/pathnode.h        |    9 +
 src/include/optimizer/paths.h           |    3 +
 23 files changed, 3025 insertions(+), 35 deletions(-)
 create mode 100644 src/backend/executor/nodeBrinSort.c
 create mode 100644 src/include/executor/nodeBrinSort.h

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 0135a00ae91..9d84063055c 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -16,6 +16,10 @@
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
 #include "miscadmin.h"
@@ -42,6 +46,9 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 /* calculate the stats in different ways for cross-checking */
 #define STATS_CROSS_CHECK
 
+/* print info about ranges */
+#define BRINSORT_DEBUG
+
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1594,6 +1601,385 @@ cleanup:
 	PG_RETURN_POINTER(stats);
 }
 
+/*
+ * brin_minmax_range_tupdesc
+ *		Create a tuple descriptor to store BrinRange data.
+ */
+static TupleDesc
+brin_minmax_range_tupdesc(BrinDesc *brdesc, AttrNumber attnum)
+{
+	TupleDesc	tupdesc;
+	AttrNumber	attno = 1;
+
+	/* expect minimum and maximum */
+	Assert(brdesc->bd_info[attnum - 1]->oi_nstored == 2);
+
+	tupdesc = CreateTemplateTupleDesc(7);
+
+	/* blkno_start */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT8OID, -1, 0);
+
+	/* blkno_end (could be calculated as blkno_start + pages_per_range) */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT8OID, -1, 0);
+
+	/* has_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* all_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* not_summarized */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* min_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	/* max_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	return tupdesc;
+}
+
+/*
+ * brin_minmax_range_tuple
+ *		Form a minimal tuple representing range info.
+ */
+static MinimalTuple
+brin_minmax_range_tuple(TupleDesc tupdesc,
+						BlockNumber block_start, BlockNumber block_end,
+						bool has_nulls, bool all_nulls, bool not_summarized,
+						Datum min_value, Datum max_value)
+{
+	Datum	values[7];
+	bool	nulls[7];
+
+	memset(nulls, 0, sizeof(nulls));
+
+	values[0] = Int64GetDatum(block_start);
+	values[1] = Int64GetDatum(block_end);
+	values[2] = BoolGetDatum(has_nulls);
+	values[3] = BoolGetDatum(all_nulls);
+	values[4] = BoolGetDatum(not_summarized);
+	values[5] = min_value;
+	values[6] = max_value;
+
+	if (all_nulls || not_summarized)
+	{
+		nulls[5] = true;
+		nulls[6] = true;
+	}
+
+	return heap_form_minimal_tuple(tupdesc, values, nulls);
+}
+
+/*
+ * brin_minmax_scan_init
+ *		Prepare the BrinRangeScanDesc including the sorting info etc.
+ *
+ * We want to have the ranges in roughly this order
+ *
+ * - not-summarized
+ * - summarized, non-null values
+ * - summarized, all-nulls
+ *
+ * We do it this way, because the not-summarized ranges need to be
+ * scanned always (both to produce NULL and non-NULL values), and
+ * we need to read all of them into the tuplesort before producing
+ * anything. So placing them at the beginning is reasonable.
+ *
+ * The all-nulls ranges are placed last, because when processing
+ * NULLs we need to scan everything anyway (some of the ranges might
+ * have has_nulls=true). But for non-NULL values we can abort once
+ * we hit the first all-nulls range.
+ *
+ * The regular ranges are sorted by blkno_start, to make it maybe
+ * a bit more sequential (but this only helps if there are ranges
+ * with the same minval).
+ */
+static BrinRangeScanDesc *
+brin_minmax_scan_init(BrinDesc *bdesc, AttrNumber attnum, bool asc)
+{
+	BrinRangeScanDesc  *scan;
+
+	/* sort by (not_summarized, minval, blkno_start, all_nulls) */
+	AttrNumber			keys[4];
+	Oid					collations[4];
+	bool				nullsFirst[4];
+	Oid					operators[4];
+	Oid					typid;
+	TypeCacheEntry	   *typcache;
+
+	/* we expect to have min/max value for each range, same type for both */
+	Assert(bdesc->bd_info[attnum - 1]->oi_nstored == 2);
+	Assert(bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id ==
+		   bdesc->bd_info[attnum - 1]->oi_typcache[1]->type_id);
+
+	scan = (BrinRangeScanDesc *) palloc0(sizeof(BrinRangeScanDesc));
+
+	/* build tuple descriptor for range data */
+	scan->tdesc = brin_minmax_range_tupdesc(bdesc, attnum);
+
+	/* initialize ordering info */
+	keys[0] = 5;				/* not_summarized */
+	keys[1] = 4;				/* all_nulls */
+	keys[2] = (asc) ? 6 : 7;	/* min_value (asc) or max_value (desc) */
+	keys[3] = 1;				/* blkno_start */
+
+	collations[0] = InvalidOid;	/* FIXME */
+	collations[1] = InvalidOid;	/* FIXME */
+	collations[2] = InvalidOid;	/* FIXME */
+	collations[3] = InvalidOid;	/* FIXME */
+
+	/* unrelated to the ordering desired by the user */
+	nullsFirst[0] = false;
+	nullsFirst[1] = false;
+	nullsFirst[2] = false;
+	nullsFirst[3] = false;
+
+	/* lookup sort operator for the boolean type (used for not_summarized) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_GT_OPR);
+	operators[0] = typcache->gt_opr;
+
+	/* lookup sort operator for the boolean type (used for all_nulls) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_LT_OPR);
+	operators[1] = typcache->lt_opr;
+
+	/* lookup sort operator for the min/max type */
+	typid = bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id;
+	typcache = lookup_type_cache(typid, TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+	operators[2] = (asc) ? typcache->lt_opr : typcache->gt_opr;
+
+	/* lookup sort operator for the bigint type (used for blkno_start) */
+	typcache = lookup_type_cache(INT8OID, TYPECACHE_LT_OPR);
+	operators[3] = typcache->lt_opr;
+
+	scan->ranges = tuplesort_begin_heap(scan->tdesc,
+										4, /* nkeys */
+										keys,
+										operators,
+										collations,
+										nullsFirst,
+										work_mem,
+										NULL,
+										TUPLESORT_RANDOMACCESS);
+
+	scan->slot = MakeSingleTupleTableSlot(scan->tdesc,
+										  &TTSOpsMinimalTuple);
+
+	return scan;
+}
+
+/*
+ * brin_minmax_scan_add_tuple
+ *		Form and store a tuple representing the BRIN range to the tuplestore.
+ */
+static void
+brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan,
+						  BlockNumber block_start, BlockNumber block_end,
+						  bool has_nulls, bool all_nulls, bool not_summarized,
+						  Datum min_value, Datum max_value)
+{
+	MinimalTuple tup;
+
+	tup = brin_minmax_range_tuple(scan->tdesc, block_start, block_end,
+								  has_nulls, all_nulls, not_summarized,
+								  min_value, max_value);
+
+	ExecStoreMinimalTuple(tup, scan->slot, false);
+
+	tuplesort_puttupleslot(scan->ranges, scan->slot);
+}
+
+#ifdef BRINSORT_DEBUG
+/*
+ * brin_minmax_scan_next
+ *		Return the next BRIN range information from the tuplestore.
+ *
+ * Returns NULL when there are no more ranges.
+ */
+static BrinRange *
+brin_minmax_scan_next(BrinRangeScanDesc *scan)
+{
+	if (tuplesort_gettupleslot(scan->ranges, true, false, scan->slot, NULL))
+	{
+		bool		isnull;
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = slot_getattr(scan->slot, 1, &isnull);
+		range->blkno_end = slot_getattr(scan->slot, 2, &isnull);
+		range->has_nulls = slot_getattr(scan->slot, 3, &isnull);
+		range->all_nulls = slot_getattr(scan->slot, 4, &isnull);
+		range->not_summarized = slot_getattr(scan->slot, 5, &isnull);
+		range->min_value = slot_getattr(scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(scan->slot, 7, &isnull);
+
+		return range;
+	}
+
+	return NULL;
+}
+
+/*
+ * brin_minmax_scan_dump
+ *		Print info about all page ranges stored in the tuplestore.
+ */
+static void
+brin_minmax_scan_dump(BrinRangeScanDesc *scan)
+{
+	BrinRange *range;
+
+	elog(WARNING, "===== dumping =====");
+	while ((range = brin_minmax_scan_next(scan)) != NULL)
+	{
+		elog(WARNING, "[%u %u] has_nulls %d all_nulls %d not_summarized %d values [%f %f]",
+			 range->blkno_start, range->blkno_end,
+			 range->has_nulls, range->all_nulls, range->not_summarized,
+			 DatumGetFloat8(range->min_value), DatumGetFloat8(range->max_value));
+
+		pfree(range);
+	}
+
+	/* reset the tuplestore, so that we can start scanning again */
+	tuplesort_rescan(scan->ranges);
+}
+#endif
+
+static void
+brin_minmax_scan_finalize(BrinRangeScanDesc *scan)
+{
+	tuplesort_performsort(scan->ranges);
+}
+
+/*
+ * brin_minmax_ranges
+ *		Load the BRIN ranges and sort them.
+ */
+Datum
+brin_minmax_ranges(PG_FUNCTION_ARGS)
+{
+	IndexScanDesc	scan = (IndexScanDesc) PG_GETARG_POINTER(0);
+	AttrNumber		attnum = PG_GETARG_INT16(1);
+	bool			asc = PG_GETARG_BOOL(2);
+	BrinOpaque *opaque;
+	Relation	indexRel;
+	Relation	heapRel;
+	BlockNumber nblocks;
+	BlockNumber	heapBlk;
+	Oid			heapOid;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinRangeScanDesc *brscan;
+
+	/*
+	 * Determine how many BRIN ranges could there be, allocate space and read
+	 * all the min/max values.
+	 */
+	opaque = (BrinOpaque *) scan->opaque;
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	indexRel = bdesc->bd_index;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	heapOid = IndexGetRelation(RelationGetRelid(indexRel), false);
+	heapRel = table_open(heapOid, AccessShareLock);
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+	table_close(heapRel, AccessShareLock);
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* initialize the scan describing scan of ranges sorted by minval */
+	brscan = brin_minmax_scan_init(bdesc, attnum, asc);
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   scan->xs_snapshot);
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/*
+		 * Ranges with no indexed tuple may contain anything.
+		 */
+		if (!gottuple)
+		{
+			brin_minmax_scan_add_tuple(brscan,
+									   heapBlk, heapBlk + (pagesPerRange - 1),
+									   false, false, true, 0, 0);
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/*
+				 * Placeholder tuples are treated as if not summarized.
+				 *
+				 * XXX Is this correct?
+				 */
+				brin_minmax_scan_add_tuple(brscan,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   false, false, true, 0, 0);
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				brin_minmax_scan_add_tuple(brscan,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   bval->bv_hasnulls, bval->bv_allnulls, false,
+										   bval->bv_values[0], bval->bv_values[1]);
+			}
+		}
+	}
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* do the sort and any necessary post-processing */
+	brin_minmax_scan_finalize(brscan);
+
+#ifdef BRINSORT_DEBUG
+	brin_minmax_scan_dump(brscan);
+#endif
+
+	PG_RETURN_POINTER(brscan);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c6601..e15b29246b1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -85,6 +85,8 @@ static void show_sort_keys(SortState *sortstate, List *ancestors,
 						   ExplainState *es);
 static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
+static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
+							   ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1100,6 +1102,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexScan:
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
+		case T_BrinSort:
 		case T_TidScan:
 		case T_TidRangeScan:
 		case T_SubqueryScan:
@@ -1262,6 +1265,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_IndexOnlyScan:
 			pname = sname = "Index Only Scan";
 			break;
+		case T_BrinSort:
+			pname = sname = "BRIN Sort";
+			break;
 		case T_BitmapIndexScan:
 			pname = sname = "Bitmap Index Scan";
 			break;
@@ -1508,6 +1514,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainScanTarget((Scan *) indexonlyscan, es);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *brinsort = (BrinSort *) plan;
+
+				ExplainIndexScanDetails(brinsort->indexid,
+										brinsort->indexorderdir,
+										es);
+				ExplainScanTarget((Scan *) brinsort, es);
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *bitmapindexscan = (BitmapIndexScan *) plan;
@@ -1790,6 +1806,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainPropertyFloat("Heap Fetches", NULL,
 									 planstate->instrument->ntuples2, 0, es);
 			break;
+		case T_BrinSort:
+			show_scan_qual(((BrinSort *) plan)->indexqualorig,
+						   "Index Cond", planstate, ancestors, es);
+			if (((BrinSort *) plan)->indexqualorig)
+				show_instrumentation_count("Rows Removed by Index Recheck", 2,
+										   planstate, es);
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			break;
 		case T_BitmapIndexScan:
 			show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
@@ -2389,6 +2417,21 @@ show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 						 ancestors, es);
 }
 
+/*
+ * Show the sort keys for a BRIN Sort node.
+ */
+static void
+show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSort	   *plan = (BrinSort *) sortstate->ss.ps.plan;
+
+	show_sort_group_keys((PlanState *) sortstate, "Sort Key",
+						 plan->numCols, 0, plan->sortColIdx,
+						 plan->sortOperators, plan->collations,
+						 plan->nullsFirst,
+						 ancestors, es);
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
@@ -3812,6 +3855,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
+		case T_BrinSort:
 			/* Assert it's on a real relation */
 			Assert(rte->rtekind == RTE_RELATION);
 			objectname = get_rel_name(rte->relid);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 11118d0ce02..bcaa2ce8e21 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -38,6 +38,7 @@ OBJS = \
 	nodeBitmapHeapscan.o \
 	nodeBitmapIndexscan.o \
 	nodeBitmapOr.o \
+	nodeBrinSort.o \
 	nodeCtescan.o \
 	nodeCustom.o \
 	nodeForeignscan.o \
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 36406c3af57..4a6dc3f263c 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -79,6 +79,7 @@
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
+#include "executor/nodeBrinSort.h"
 #include "executor/nodeCtescan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -226,6 +227,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														 estate, eflags);
 			break;
 
+		case T_BrinSort:
+			result = (PlanState *) ExecInitBrinSort((BrinSort *) node,
+													estate, eflags);
+			break;
+
 		case T_BitmapIndexScan:
 			result = (PlanState *) ExecInitBitmapIndexScan((BitmapIndexScan *) node,
 														   estate, eflags);
@@ -639,6 +645,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIndexOnlyScan((IndexOnlyScanState *) node);
 			break;
 
+		case T_BrinSortState:
+			ExecEndBrinSort((BrinSortState *) node);
+			break;
+
 		case T_BitmapIndexScanState:
 			ExecEndBitmapIndexScan((BitmapIndexScanState *) node);
 			break;
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
new file mode 100644
index 00000000000..ca72c1ed22d
--- /dev/null
+++ b/src/backend/executor/nodeBrinSort.c
@@ -0,0 +1,1550 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.c
+ *	  Routines to support sorted scan of relations using a BRIN index
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * The overall algorithm is roughly this:
+ *
+ * 0) initialize a tuplestore and a tuplesort
+ *
+ * 1) fetch list of page ranges from the BRIN index, sorted by minval
+ *    (with the not-summarized ranges first, and all-null ranges last)
+ *
+ * 2) for NULLS FIRST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ * 3) while there are ranges in the list, do this:
+ *
+ *   a) get next (distinct) minval from the list, call it watermark
+ *
+ *   b) if there are any tuples in the tuplestore, move them to tuplesort
+ *
+ *   c) process all ranges with (minval < watermark) - read tuples and feed
+ *      them either into tuplestore (when value < watermark) or tuplestore
+ *
+ *   d) sort the tuplestore, output all the tuples
+ *
+ * 4) if some tuples remain in the tuplestore, sort and output them
+ *
+ * 5) for NULLS LAST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ *
+ * For DESC orderings the process is almost the same, except that we look
+ * at maxval and use '>' operator (but that's transparent).
+ *
+ * There's a couple possible things that might be done in different ways:
+ *
+ * 1) Not using tuplestore, and feeding tuples only to a tuplesort. Then
+ * while producing the tuples, we'd only output tuples up to the current
+ * watermark, and then we'd keep the remaining tuples for the next round.
+ * Either we'd need to transfer them into a second tuplesort, or allow
+ * "reopening" the tuplesort and adding more tuples. And then only the
+ * part since the watermark would get sorted (possibly using a merge-sort
+ * with the already sorted part).
+ *
+ *
+ * 2) The other question is what to do with NULL values - at the moment we
+ * just read the ranges, output the NULL tuples and that's it - we're not
+ * retaining any non-NULL tuples, so that we'll read the ranges again in
+ * the second range. The logic here is that either there are very few
+ * such ranges, so it's won't cost much to just re-read them. Or maybe
+ * there are very many such ranges, and we'd do a lot of spilling to the
+ * tuplestore, and it's not much more expensive to just re-read the source
+ * data. There are counter-examples, though - e.g., there might be many
+ * has_nulls ranges, but with very few non-NULL tuples. In this case it
+ * might be better to actually spill the tuples instead of re-reading all
+ * the ranges. Maybe this is something we can do at run-time, or maybe we
+ * could estimate this at planning time. We do know the null_frac for the
+ * column, so we know the number of NULL rows. And we also know the number
+ * of all_nulls and has_nulls ranges. We can estimate the number of rows
+ * per range, and we can estimate how many non-NULL rows are in the
+ * has_nulls ranges (we don't need to re-read all-nulls ranges). There's
+ * also the filter, which may reduce the amount of rows to store.
+ *
+ * So we'd need to compare two metrics calculated roughly like this:
+ *
+ *   cost(re-reading has-nulls ranges)
+ *      = cost(random_page_cost * n_has_nulls + seq_page_cost * pages_per_range)
+ *
+ *   cost(spilling non-NULL rows from has-nulls ranges)
+ *      = cost(numrows * width / BLCKSZ * seq_page_cost * 2)
+ *
+ * where numrows is the number of non-NULL rows in has_null ranges, which
+ * can be calculated like this:
+ *
+ *   // estimated number of rows in has-null ranges
+ *   rows_in_has_nulls = (reltuples / relpages) * pages_per_range * n_has_nulls
+ *
+ *   // number of NULL rows in the has-nulls ranges
+ *   nulls_in_ranges = reltuples * null_frac - n_all_nulls * (reltuples / relpages)
+ *
+ *   // numrows is the difference, multiplied by selectivity of the index
+ *   // filter condition (value between 0.0 and 1.0)
+ *   numrows = (rows_in_has_nulls - nulls_in_ranges) * selectivity
+ *
+ * This ignores non-summarized ranges, but there should be only very few of
+ * those, so it should not make a huge difference. Otherwise we can divide
+ * them between regular, has-nulls and all-nulls pages to keep the ratio.
+ *
+ *
+ * 3) How large step to make when updating the watermark?
+ *
+ * When updating the watermark, one option is to simply proceed to the next
+ * distinct minval value, which is the smallest possible step we can make.
+ * This may be both fine and very inefficient, depending on how many rows
+ * end up in the tuplesort and how many rows we end up spilling (possibly
+ * repeatedly to the tuplestore).
+ *
+ * When having to sort large number of rows, it's inefficient to run many
+ * tiny sorts, even if it produces correct result. For example when sorting
+ * 1M rows, we may split this as either (a) 100000x sorts of 10 rows, or
+ * (b) 1000 sorts of 1000 rows. The (b) option is almost certainly more
+ * efficient. Maybe sorts of 10k rows would be even better, if it fits
+ * into work_mem.
+ *
+ * This gets back to how large the page ranges are, and if/how much they
+ * overlap. With tiny ranges (e.g. a single-page ranges), a single range
+ * can only add as many rows as we can fit on a single page. So we need
+ * more ranges by default - how many watermark steps that is depends on
+ * how many distinct minval values are there ...
+ *
+ * Then there's overlaps - if ranges do not overlap, we're done and we'll
+ * add the whole range because the next watermark is above maxval. But
+ * when the ranges overlap, we'll only add the first part (assuming the
+ * minval of the next range is the watermark). Assume 10 overlapping
+ * ranges - imagine for example ranges shifted by 10%, so something like
+ *
+ *   [0,100] [10,110], [20,120], [30, 130], ..., [90, 190]
+ *
+ * In the first step we use watermark=10 and load the first range, with
+ * maybe 1000 rows in total. But assuming uniform distribution, only about
+ * 100 rows will go into the tuplesort, the remaining 900 rows will go into
+ * the tuplestore (assuming uniform distribution). Then in the second step
+ * we sort another 100 rows and the remaining 800 rows will be moved into
+ * a new tuplestore. And so on and so on.
+ *
+ * This means that incrementing the watermarks by single steps may be
+ * quite inefficient, and we need to reflect both the range size and
+ * how much the ranges overlap.
+ *
+ * In fact, maybe we should not determine the step as number of minval
+ * values to skip, but how many ranges would that mean reading. Because
+ * if we have a minval with many duplicates, that may load many rows.
+ * Or even better, we could look at how many rows would that mean loading
+ * into the tuplestore - if we track P(x<minval) for each range (e.g. by
+ * calculating average value during ANALYZE, or perhaps by estimating
+ * it from per-column stats), then we know the increment is going to be
+ * about
+ *
+ *     P(x < minval[i]) - P(x < minval[i-1])
+ *
+ * and we can stop once we'd exceed work_mem (with some slack). See comment
+ * for brin_minmax_stats() for more thoughts.
+ *
+ *
+ * 4) LIMIT/OFFSET vs. full sort
+ *
+ * There's one case where very small sorts may be actually optimal, and
+ * that's queries that need to process only very few rows - say, LIMIT
+ * queries with very small bound.
+ *
+ *
+ * FIXME Projection does not work (fails on projection slot expecting
+ * buffer ops, but we're sending it minimal tuple slot).
+ *
+ * FIXME The tlists are not wired quite correctly - the sortColIdx is an
+ * index to the tlist, but we need attnum from the heap table, so that we
+ * can fetch the attribute etc. Or maybe fetching the value from the raw
+ * tuple (before projection) is wrong and needs to be done differently.
+ *
+ * FIXME Indexes on expressions don't work (possibly related to the tlist
+ * being done incorrectly).
+ *
+ * FIXME handling of other brin opclasses (minmax-multi)
+ *
+ * FIXME improve costing
+ *
+ *
+ * Improvement ideas:
+ *
+ * 1) multiple tuplestores for overlapping ranges
+ *
+ * When there are many overlapping ranges (so that maxval > current.maxval),
+ * we're loading all the "future" tuples into a new tuplestore. However, if
+ * there are multiple such ranges (imagine ranges "shifting" by 10%, which
+ * gives us 9 more ranges), we know in the next round we'll only need rows
+ * until the next maxval. We'll not sort these rows, but we'll still shuffle
+ * them around until we get to the proper range (so about 10x each row).
+ * Maybe we should pre-allocate the tuplestores (or maybe even tuplesorts)
+ * for future ranges, and route the tuples to the correct one? Maybe we
+ * could be a bit smarter and discard tuples once we have enough rows for
+ * the preceding ranges (say, with LIMIT queries). We'd also need to worry
+ * about work_mem, though - we can't just use many tuplestores, each with
+ * whole work_mem. So we'd probably use e.g. work_mem/2 for the next one,
+ * and then /4, /8 etc. for the following ones. That's work_mem in total.
+ * And there'd need to be some limit on number of tuplestores, I guess.
+ *
+ * 2) handling NULL values
+ *
+ * We need to handle NULLS FIRST / NULLS LAST cases. The question is how
+ * to do that - the easiest way is to simply do a separate scan of ranges
+ * that might contain NULL values, processing just rows with NULLs, and
+ * discarding other rows. And then process non-NULL values as currently.
+ * The NULL scan would happen before/after this regular phase.
+ *
+ * Byt maybe we could be smarter, and not do separate scans. When reading
+ * a page, we might stash the tuple in a tuplestore, so that we can read
+ * it the next round. Obviously, this might be expensive if we need to
+ * keep too many rows, so the tuplestore would grow too large - in that
+ * case it might be better to just do the two scans.
+ *
+ * 3) parallelism
+ *
+ * Presumably we could do a parallel version of this. The leader or first
+ * worker would prepare the range information, and the workers would then
+ * grab ranges (in a kinda round robin manner), sort them independently,
+ * and then the results would be merged by Gather Merge.
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeBrinSort.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *		ExecBrinSort			scans a relation using an index
+ *		IndexNext				retrieve next tuple using index
+ *		ExecInitBrinSort		creates and initializes state info.
+ *		ExecReScanBrinSort		rescans the indexed relation.
+ *		ExecEndBrinSort			releases all storage.
+ *		ExecBrinSortMarkPos		marks scan position.
+ *		ExecBrinSortRestrPos	restores scan position.
+ *		ExecBrinSortEstimate	estimates DSM space needed for parallel index scan
+ *		ExecBrinSortInitializeDSM initialize DSM for parallel BrinSort
+ *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
+ *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
+ */
+#include "postgres.h"
+
+#include "access/brin.h"
+#include "access/brin_internal.h"
+#include "access/nbtree.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
+#include "executor/execdebug.h"
+#include "executor/nodeBrinSort.h"
+#include "lib/pairingheap.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/array.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+
+static TupleTableSlot *IndexNext(BrinSortState *node);
+static bool IndexRecheck(BrinSortState *node, TupleTableSlot *slot);
+static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
+
+#define BRINSORT_DEBUG
+
+/* do various consistency checks */
+static void
+AssertCheckRanges(BrinSortState *node)
+{
+#ifdef USE_ASSERT_CHECKING
+
+#endif
+}
+
+/*
+ * brinsort_start_tidscan
+ *		Start scanning tuples from a given page range.
+ *
+ * We open a TID range scan for the given range, and initialize the tuplesort.
+ * Optionally, we update the watermark (with either high/low value). We only
+ * need to do this for the main page range, not for the intersecting ranges.
+ *
+ * XXX Maybe we should initialize the tidscan only once, and then do rescan
+ * for the following ranges? And similarly for the tuplesort?
+ */
+static void
+brinsort_start_tidscan(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate = node->ss.ps.state;
+	BrinRange  *range = node->bs_range;
+
+	/* There must not be any TID scan in progress yet. */
+	Assert(node->ss.ss_currentScanDesc == NULL);
+
+	/* Initialize the TID range scan, for the provided block range. */
+	if (node->ss.ss_currentScanDesc == NULL)
+	{
+		TableScanDesc		tscandesc;
+		ItemPointerData		mintid,
+							maxtid;
+
+		ItemPointerSetBlockNumber(&mintid, range->blkno_start);
+		ItemPointerSetOffsetNumber(&mintid, 0);
+
+		ItemPointerSetBlockNumber(&maxtid, range->blkno_end);
+		ItemPointerSetOffsetNumber(&maxtid, MaxHeapTuplesPerPage);
+
+		elog(DEBUG1, "loading range blocks [%u, %u]",
+			 range->blkno_start, range->blkno_end);
+
+		tscandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 &mintid, &maxtid);
+		node->ss.ss_currentScanDesc = tscandesc;
+	}
+
+	if (node->bs_tuplesortstate == NULL)
+	{
+		TupleDesc	tupDesc = RelationGetDescr(node->ss.ss_currentRelation);
+
+		node->bs_tuplesortstate = tuplesort_begin_heap(tupDesc,
+													plan->numCols,
+													plan->sortColIdx,
+													plan->sortOperators,
+													plan->collations,
+													plan->nullsFirst,
+													work_mem,
+													NULL,
+													TUPLESORT_NONE);
+	}
+
+	if (node->bs_tuplestore == NULL)
+	{
+		node->bs_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+	}
+}
+
+/*
+ * brinsort_end_tidscan
+ *		Finish the TID range scan.
+ */
+static void
+brinsort_end_tidscan(BrinSortState *node)
+{
+	/* get the first range, read all tuples using a tid range scan */
+	if (node->ss.ss_currentScanDesc != NULL)
+	{
+		table_endscan(node->ss.ss_currentScanDesc);
+		node->ss.ss_currentScanDesc = NULL;
+	}
+}
+
+/*
+ * brinsort_update_watermark
+ *		Advance the watermark to the next minval (or maxval for DESC).
+ *
+ * We could could actually advance the watermark by multiple steps (not to
+ * the immediately following minval, but a couple more), to accumulate more
+ * rows in the tuplesort. The number of steps we make correlates with the
+ * amount of data we sort in a given step, but we don't know in advance
+ * how many rows (or bytes) will that actually be. We could do some simple
+ * heuristics (measure past sorts and extrapolate).
+ */
+static void
+brinsort_update_watermark(BrinSortState *node, bool asc)
+{
+	int		cmp;
+	bool	found = false;
+
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool	isnull;
+		Datum	value;
+		bool	all_nulls;
+		bool	not_summarized;
+
+		all_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 4, &isnull));
+		Assert(!isnull);
+
+		not_summarized = DatumGetBool(slot_getattr(node->bs_scan->slot, 5, &isnull));
+		Assert(!isnull);
+
+		/* we ignore ranges that are either all_nulls or not summarized */
+		if (all_nulls || not_summarized)
+			continue;
+
+		/* use either minval or maxval, depending on the ASC / DESC */
+		if (asc)
+			value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		else
+			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		if (!node->bs_watermark_set)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			continue;
+		}
+
+		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
+								  &node->bs_sortsupport);
+
+		if (cmp < 0)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			found = true;
+			break;
+		}
+	}
+
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	node->bs_watermark_set = found;
+}
+
+/*
+ * brinsort_load_tuples
+ *		Load tuples from the TID range scan, add them to tuplesort/store.
+ *
+ * When called for the "current" range, we don't need to check the watermark,
+ * we know the tuple goes into the tuplesort. So with check_watermark we
+ * skip the comparator call to save CPU cost.
+ */
+static void
+brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_processing)
+{
+	BrinSort	   *plan = (BrinSort *) node->ss.ps.plan;
+	TableScanDesc	scan;
+	EState		   *estate;
+	ScanDirection	direction;
+	TupleTableSlot *slot;
+	BrinRange	   *range = node->bs_range;
+
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+
+	slot = node->ss.ss_ScanTupleSlot;
+
+	Assert(node->bs_range != NULL);
+
+	/*
+	 * If we're not processign NULLS, and this is all-nulls range, we can
+	 * just skip it - we won't find any non-NULL tuples in it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (!null_processing && range->all_nulls)
+		return;
+
+	/*
+	 * Similarly, if we're processing NULLs and this range does not have
+	 * has_nulls flag, we can skip it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
+		return;
+
+	brinsort_start_tidscan(node);
+
+	scan = node->ss.ss_currentScanDesc;
+
+	/*
+	 * Read tuples, evaluate the filer (so that we don't keep tuples only to
+	 * discard them later), and decide if it goes into the current range
+	 * (tuplesort) or overflow (tuplestore).
+	 */
+	while (table_scan_getnextslot_tidrange(scan, direction, slot))
+	{
+		ExprContext *econtext;
+		ExprState  *qual;
+
+		/*
+		 * Fetch data from node
+		 */
+		qual = node->bs_qual;
+		econtext = node->ss.ps.ps_ExprContext;
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 *
+		 * XXX Done here, because in ExecScan we'll get different slot type
+		 * (minimal tuple vs. buffered tuple). Scan expects slot while reading
+		 * from the table (like here), but we're stashing it into a tuplesort.
+		 *
+		 * XXX Maybe we could eliminate many tuples by leveraging the BRIN
+		 * range, by executing the consistent function. But we don't have
+		 * the qual in appropriate format at the moment, so we'd preprocess
+		 * the keys similarly to bringetbitmap(). In which case we should
+		 * probably evaluate the stuff while building the ranges? Although,
+		 * if the "consistent" function is expensive, it might be cheaper
+		 * to do that incrementally, as we need the ranges. Would be a win
+		 * for LIMIT queries, for example.
+		 *
+		 * XXX However, maybe we could also leverage other bitmap indexes,
+		 * particularly for BRIN indexes because that makes it simpler to
+		 * eliminage the ranges incrementally - we know which ranges to
+		 * load from the index, while for other indexes (e.g. btree) we
+		 * have to read the whole index and build a bitmap in order to have
+		 * a bitmap for any range. Although, if the condition is very
+		 * selective, we may need to read only a small fraction of the
+		 * index, so maybe that's OK.
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			int		cmp = 0;	/* matters for check_watermark=false */
+			Datum	value;
+			bool	isnull;
+
+			value = slot_getattr(slot, plan->sortColIdx[0], &isnull);
+
+			/*
+			 * FIXME Not handling NULLS for now, we need to stash them into
+			 * a separate tuplestore (so that we can output them first or
+			 * last), and then skip them in the regular processing?
+			 */
+			if (null_processing)
+			{
+				/* Stash it to the tuplestore (when NULL, or ignore
+				 * it (when not-NULL). */
+				if (isnull)
+					tuplestore_puttupleslot(node->bs_tuplestore, slot);
+
+				/* NULL or not, we're done */
+				continue;
+			}
+
+			/* we're not processing NULL values, so ignore NULLs */
+			if (isnull)
+				continue;
+
+			/*
+			 * Otherwise compare to watermark, and stash it either to the
+			 * tuplesort or tuplestore.
+			 */
+			if (check_watermark && node->bs_watermark_set)
+				cmp = ApplySortComparator(value, false,
+										  node->bs_watermark, false,
+										  &node->bs_sortsupport);
+
+			if (cmp <= 0)
+				tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+			else
+				tuplestore_puttupleslot(node->bs_tuplestore, slot);
+		}
+
+		ExecClearTuple(slot);
+	}
+
+	ExecClearTuple(slot);
+
+	brinsort_end_tidscan(node);
+}
+
+/*
+ * brinsort_load_spill_tuples
+ *		Load tuples from the spill tuplestore, and either stash them into
+ *		a tuplesort or a new tuplestore.
+ *
+ * After processing the last range, we want to process all remaining ranges,
+ * so with check_watermark=false we skip the check.
+ */
+static void
+brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	Tuplestorestate *tupstore;
+	TupleTableSlot *slot;
+
+	if (node->bs_tuplestore == NULL)
+		return;
+
+	/* start scanning the existing tuplestore (XXX needed?) */
+	tuplestore_rescan(node->bs_tuplestore);
+
+	/*
+	 * Create a new tuplestore, for tuples that exceed the watermark and so
+	 * should not be included in the current sort.
+	 */
+	tupstore = tuplestore_begin_heap(false, false, work_mem);
+
+	/*
+	 * We need a slot for minimal tuples. The scan slot uses buffered tuples,
+	 * so it'd trigger an error in the loop.
+	 */
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(node->ss.ss_currentRelation),
+									&TTSOpsMinimalTuple);
+
+	while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+	{
+		int		cmp = 0;	/* matters for check_watermark=false */
+		bool	isnull;
+		Datum	value;
+
+		value = slot_getattr(slot, plan->sortColIdx[0], &isnull);
+
+		/* We shouldn't have NULL values in the spill, at least not now. */
+		Assert(!isnull);
+
+		if (check_watermark && node->bs_watermark_set)
+			cmp = ApplySortComparator(value, false,
+									  node->bs_watermark, false,
+									  &node->bs_sortsupport);
+
+		if (cmp <= 0)
+			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+		else
+			tuplestore_puttupleslot(tupstore, slot);
+	}
+
+	/*
+	 * Discard the existing tuplestore (that we just processed), use the new
+	 * one instead.
+	 */
+	tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = tupstore;
+
+	ExecDropSingleTupleTableSlot(slot);
+}
+
+static bool
+brinsort_next_range(BrinSortState *node, bool asc)
+{
+	/* FIXME free the current bs_range, if any */
+	node->bs_range = NULL;
+
+	/*
+	 * Mark the position, so that we can restore it in case we reach the
+	 * current watermark.
+	 */
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	/*
+	 * Get the next range and return it, unless we can prove it's the last
+	 * range that can possibly match the current conditon (thanks to how we
+	 * order the ranges).
+	 *
+	 * Also skip ranges that can't possibly match (e.g. because we are in
+	 * NULL processing, and the range has no NULLs).
+	 */
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool		isnull;
+		Datum		value;
+
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = slot_getattr(node->bs_scan->slot, 1, &isnull);
+		range->blkno_end = slot_getattr(node->bs_scan->slot, 2, &isnull);
+		range->has_nulls = slot_getattr(node->bs_scan->slot, 3, &isnull);
+		range->all_nulls = slot_getattr(node->bs_scan->slot, 4, &isnull);
+		range->not_summarized = slot_getattr(node->bs_scan->slot, 5, &isnull);
+		range->min_value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		/*
+		 * Not-summarized ranges match irrespectedly of the watermark (if
+		 * it's set at all).
+		 */
+		if (range->not_summarized)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * The range is summarized, but maybe the watermark is not? That
+		 * would mean we're processing NULL values, so we skip ranges that
+		 * can't possibly match (i.e. with all_nulls=has_nulls=false).
+		 */
+		if (!node->bs_watermark_set)
+		{
+			if (range->all_nulls || range->has_nulls)
+			{
+				node->bs_range = range;
+				return true;
+			}
+
+			/* update the position and try the next range */
+			tuplesort_markpos(node->bs_scan->ranges);
+			pfree(range);
+
+			continue;
+		}
+
+		/*
+		 * So now we have a summarized range, and we know the watermark
+		 * is set too (so we're not processing NULLs). We place the ranges
+		 * with only nulls last, so once we hit one we're done.
+		 */
+		if (range->all_nulls)
+		{
+			pfree(range);
+			return false;	/* no more matching ranges */
+		}
+
+		/*
+		 * Compare the range to the watermark, using either the minval or
+		 * maxval, depending on ASC/DESC ordering. If the range precedes the
+		 * watermark, return it. Otherwise abort, all the future ranges are
+		 * either not matching the watermark (thanks to ordering) or contain
+		 * only NULL values.
+		 */
+
+		/* use minval or maxval, depending on ASC / DESC */
+		value = (asc) ? range->min_value : range->max_value;
+
+		/*
+		 * compare it to the current watermark (if set)
+		 *
+		 * XXX We don't use (... <= 0) here, because then we'd load ranges
+		 * with that minval (and there might be multiple), but most of the
+		 * rows would go into the tuplestore, because only rows matching the
+		 * minval exactly would be loaded into tuplesort.
+		 */
+		if (ApplySortComparator(value, false,
+								 node->bs_watermark, false,
+								 &node->bs_sortsupport) < 0)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		pfree(range);
+		break;
+	}
+
+	/* not a matching range, we're done */
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	return false;
+}
+
+static bool
+brinsort_range_with_nulls(BrinSortState *node)
+{
+	BrinRange *range = node->bs_range;
+
+	if (range->all_nulls || range->has_nulls || range->not_summarized)
+		return true;
+
+	return false;
+}
+
+static void
+brinsort_rescan(BrinSortState *node)
+{
+	tuplesort_rescan(node->bs_scan->ranges);
+}
+
+/* ----------------------------------------------------------------
+ *		IndexNext
+ *
+ *		Retrieve a tuple from the BrinSort node's currentRelation
+ *		using the index specified in the BrinSortState information.
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+IndexNext(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate;
+	ScanDirection direction;
+	IndexScanDesc scandesc;
+	TupleTableSlot *slot;
+	bool		nullsFirst;
+	bool		asc;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+
+	/* flip direction if this is an overall backward scan */
+	/* XXX For BRIN indexes this is always forward direction */
+	// if (ScanDirectionIsBackward(((BrinSort *) node->ss.ps.plan)->indexorderdir))
+	if (false)
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	scandesc = node->iss_ScanDesc;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	nullsFirst = plan->nullsFirst[0];
+	asc = ScanDirectionIsForward(plan->indexorderdir);
+
+	if (scandesc == NULL)
+	{
+		/*
+		 * We reach here if the index scan is not parallel, or if we're
+		 * serially executing an index scan that was planned to be parallel.
+		 */
+		scandesc = index_beginscan(node->ss.ss_currentRelation,
+								   node->iss_RelationDesc,
+								   estate->es_snapshot,
+								   node->iss_NumScanKeys,
+								   node->iss_NumOrderByKeys);
+
+		node->iss_ScanDesc = scandesc;
+
+		/*
+		 * If no run-time keys to calculate or they are ready, go ahead and
+		 * pass the scankeys to the index AM.
+		 */
+		if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+			index_rescan(scandesc,
+						 node->iss_ScanKeys, node->iss_NumScanKeys,
+						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+
+		/*
+		 * Load info about BRIN ranges, sort them to match the desired ordering.
+		 */
+		ExecInitBrinSortRanges(plan, node);
+		node->bs_phase = BRINSORT_START;
+	}
+
+	/*
+	 * ok, now that we have what we need, fetch the next tuple.
+	 */
+	while (node->bs_phase != BRINSORT_FINISHED)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		elog(DEBUG1, "phase = %d", node->bs_phase);
+
+		AssertCheckRanges(node);
+
+		switch (node->bs_phase)
+		{
+			case BRINSORT_START:
+
+				elog(DEBUG1, "phase = START");
+
+				/*
+				 * If we have NULLS FIRST, move to that stage. Otherwise
+				 * start scanning regular ranges.
+				 */
+				if (nullsFirst)
+					node->bs_phase = BRINSORT_LOAD_NULLS;
+				else
+				{
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+
+					/* set the first watermark */
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_RANGE:
+				{
+					elog(DEBUG1, "phase = LOAD_RANGE");
+
+					/*
+					 * Load tuples matching the new watermark from the existing
+					 * spill tuplestore. We do this before loading tuples from
+					 * the next chunk of ranges, because those will add tuples
+					 * to the spill, and we'd end up processing those twice.
+					 */
+					brinsort_load_spill_tuples(node, true);
+
+					/*
+					 * Load tuples from ranges, until we find a range that has
+					 * min_value >= watermark.
+					 *
+					 * XXX In fact, we are guaranteed to find an exact match
+					 * for the watermark, because of how we pick the watermark.
+					 */
+					while (brinsort_next_range(node, asc))
+						brinsort_load_tuples(node, true, false);
+
+					/*
+					 * If we have loaded any tuples into the tuplesort, try
+					 * sorting it and move to producing the tuples.
+					 *
+					 * XXX The range might have no rows matching the current
+					 * watermark, in which case the tuplesort is empty.
+					 */
+					if (node->bs_tuplesortstate)
+					{
+						tuplesort_performsort(node->bs_tuplesortstate);
+#ifdef BRINSORT_DEBUG
+						{
+							TuplesortInstrumentation stats;
+
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							elog(DEBUG1, "method: %s  space: %ld kB (%s)",
+								 tuplesort_method_name(stats.sortMethod),
+								 stats.spaceUsed,
+								 tuplesort_space_type_name(stats.spaceType));
+						}
+#endif
+					}
+
+					node->bs_phase = BRINSORT_PROCESS_RANGE;
+					break;
+				}
+
+			case BRINSORT_PROCESS_RANGE:
+
+				elog(DEBUG1, "phase BRINSORT_PROCESS_RANGE");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplesortstate != NULL)
+				{
+					if (tuplesort_gettupleslot(node->bs_tuplesortstate,
+										ScanDirectionIsForward(direction),
+										false, slot, NULL))
+						return slot;
+
+					/* once we're done with the tuplesort, reset it */
+					tuplesort_reset(node->bs_tuplesortstate);
+				}
+
+				/*
+				 * Now that we processed tuples from the last range batch,
+				 * see if we reached the end of if we should try updating
+				 * the watermark once again. If the watermark is not set,
+				 * we've already processed the last range.
+				 */
+				if (!node->bs_watermark_set)
+				{
+					if (nullsFirst)
+						node->bs_phase = BRINSORT_FINISHED;
+					else
+					{
+						brinsort_rescan(node);
+						node->bs_phase = BRINSORT_LOAD_NULLS;
+					}
+				}
+				else
+				{
+					/* updte the watermark and try reading more ranges */
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_NULLS:
+				{
+					elog(DEBUG1, "phase = LOAD_NULLS");
+
+					/*
+					 * Try loading another range. If there are no more ranges,
+					 * we're done and we move either to loading regular ranges.
+					 * Otherwise check if this range can contain 
+					 */
+					while (true)
+					{
+						/* no more ranges - terminate or load regular ranges */
+						if (!brinsort_next_range(node, asc))
+						{
+							if (nullsFirst)
+							{
+								brinsort_rescan(node);
+								node->bs_phase = BRINSORT_LOAD_RANGE;
+								brinsort_update_watermark(node, asc);
+							}
+							else
+								node->bs_phase = BRINSORT_FINISHED;
+
+							break;
+						}
+
+						/* If this range (may) have nulls, proces them */
+						if (brinsort_range_with_nulls(node))
+							break;
+					}
+
+					if (node->bs_range == NULL)
+						break;
+
+					/*
+					 * There should be nothing left in the tuplestore, because
+					 * we flush that at the end of processing regular tuples,
+					 * and we don't retain tuples between NULL ranges.
+					 */
+					// Assert(node->bs_tuplestore == NULL);
+
+					/*
+					 * Load the next unprocessed / NULL range. We don't need to
+					 * check watermark while processing NULLS.
+					 */
+					brinsort_load_tuples(node, false, true);
+
+					node->bs_phase = BRINSORT_PROCESS_NULLS;
+					break;
+				}
+
+				break;
+
+			case BRINSORT_PROCESS_NULLS:
+
+				elog(DEBUG1, "phase = LOAD_NULLS");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				Assert(node->bs_tuplestore != NULL);
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplestore != NULL)
+				{
+
+					while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+						return slot;
+
+					tuplestore_end(node->bs_tuplestore);
+					node->bs_tuplestore = NULL;
+
+					node->bs_phase = BRINSORT_LOAD_NULLS;	/* load next range */
+				}
+
+				break;
+
+			case BRINSORT_FINISHED:
+				elog(ERROR, "unexpected BrinSort phase: FINISHED");
+				break;
+		}
+	}
+
+	/*
+	 * if we get here it means the index scan failed so we are at the end of
+	 * the scan..
+	 */
+	node->iss_ReachedEnd = true;
+	return ExecClearTuple(slot);
+}
+
+/*
+ * IndexRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+IndexRecheck(BrinSortState *node, TupleTableSlot *slot)
+{
+	ExprContext *econtext;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	econtext = node->ss.ps.ps_ExprContext;
+
+	/* Does the tuple meet the indexqual condition? */
+	econtext->ecxt_scantuple = slot;
+	return ExecQualAndReset(node->indexqualorig, econtext);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSort(node)
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecBrinSort(PlanState *pstate)
+{
+	BrinSortState *node = castNode(BrinSortState, pstate);
+
+	/*
+	 * If we have runtime keys and they've not already been set up, do it now.
+	 */
+	if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady)
+		ExecReScan((PlanState *) node);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) IndexNext,
+					(ExecScanRecheckMtd) IndexRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanBrinSort(node)
+ *
+ *		Recalculates the values of any scan keys whose value depends on
+ *		information known at runtime, then rescans the indexed relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanBrinSort(BrinSortState *node)
+{
+	/*
+	 * If we are doing runtime key calculations (ie, any of the index key
+	 * values weren't simple Consts), compute the new key values.  But first,
+	 * reset the context so we don't leak memory as each outer tuple is
+	 * scanned.  Note this assumes that we will recalculate *all* runtime keys
+	 * on each call.
+	 */
+	if (node->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *econtext = node->iss_RuntimeContext;
+
+		ResetExprContext(econtext);
+		ExecIndexEvalRuntimeKeys(econtext,
+								 node->iss_RuntimeKeys,
+								 node->iss_NumRuntimeKeys);
+	}
+	node->iss_RuntimeKeysReady = true;
+
+	/* reset index scan */
+	if (node->iss_ScanDesc)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+	node->iss_ReachedEnd = false;
+
+	ExecScanReScan(&node->ss);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecEndBrinSort
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndBrinSort(BrinSortState *node)
+{
+	Relation	indexRelationDesc;
+	IndexScanDesc IndexScanDesc;
+
+	/*
+	 * extract information from the node
+	 */
+	indexRelationDesc = node->iss_RelationDesc;
+	IndexScanDesc = node->iss_ScanDesc;
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/*
+	 * close the index relation (no-op if we didn't open it)
+	 */
+	if (IndexScanDesc)
+		index_endscan(IndexScanDesc);
+	if (indexRelationDesc)
+		index_close(indexRelationDesc, NoLock);
+
+	if (node->ss.ss_currentScanDesc != NULL)
+		table_endscan(node->ss.ss_currentScanDesc);
+
+	if (node->bs_tuplestore != NULL)
+		tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = NULL;
+
+	if (node->bs_tuplesortstate != NULL)
+		tuplesort_end(node->bs_tuplesortstate);
+	node->bs_tuplesortstate = NULL;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortMarkPos
+ *
+ * Note: we assume that no caller attempts to set a mark before having read
+ * at least one tuple.  Otherwise, iss_ScanDesc might still be NULL.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortMarkPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  If a test tuple exists for
+		 * this relation, then we shouldn't access the index at all.  We would
+		 * instead need to save, and later restore, the state of the
+		 * relsubs_done flag, so that re-fetching the test tuple is possible.
+		 * However, given the assumption that no caller sets a mark at the
+		 * start of the scan, we can only get here with relsubs_done[i]
+		 * already set, and so no state need be saved.
+		 */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortMarkPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_markpos(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecIndexRestrPos
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortRestrPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (estate->es_epq_active != NULL)
+	{
+		/* See comments in ExecIndexMarkPos */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortRestrPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_restrpos(node->iss_ScanDesc);
+}
+
+/*
+ * somewhat crippled verson of bringetbitmap
+ *
+ * XXX We don't call consistent function (or any other function), so unlike
+ * bringetbitmap we don't set a separate memory context. If we end up filtering
+ * the ranges somehow (e.g. by WHERE conditions), this might be necessary.
+ *
+ * XXX Should be part of opclass, to somewhere in brin_minmax.c etc.
+ */
+static void
+ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
+{
+	IndexScanDesc	scan = planstate->iss_ScanDesc;
+	Relation	indexRel = planstate->iss_RelationDesc;
+	int			attno;
+	FmgrInfo   *rangeproc;
+	BrinRangeScanDesc *brscan;
+	bool		asc;
+
+	/* BRIN Sort only allows ORDER BY using a single column */
+	Assert(node->numCols == 1);
+
+	/*
+	 * Determine index attnum we're interested in. The sortColIdx has attnums
+	 * from the table, but we need index attnum so that we can fetch the right
+	 * range summary.
+	 *
+	 * XXX Maybe we could/should arrange the tlists differently, so that this
+	 * is not necessary?
+	 *
+	 * FIXME This is broken, node->sortColIdx[0] is an index into the target
+	 * list, not table attnum.
+	 *
+	 * FIXME Also the projection is broken.
+	 */
+	attno = 0;
+	for (int i = 0; i < indexRel->rd_index->indnatts; i++)
+	{
+		if (indexRel->rd_index->indkey.values[i] == node->sortColIdx[0])
+		{
+			attno = (i + 1);
+			break;
+		}
+	}
+
+	/* make sure we matched the argument */
+	Assert(attno > 0);
+
+	/* get procedure to generate sort ranges */
+	rangeproc = index_getprocinfo(indexRel, attno, BRIN_PROCNUM_RANGES);
+
+	/*
+	 * Should not get here without a proc, thanks to the check before
+	 * building the BrinSort path.
+	 */
+	Assert(rangeproc != NULL);
+
+	memset(&planstate->bs_sortsupport, 0, sizeof(SortSupportData));
+	PrepareSortSupportFromOrderingOp(node->sortOperators[0], &planstate->bs_sortsupport);
+
+	/*
+	 * Determine if this ASC or DESC sort, so that we can request the
+	 * ranges in the appropriate order (ordered either by minval for
+	 * ASC, or by maxval for DESC).
+	 */
+	asc = ScanDirectionIsForward(node->indexorderdir);
+
+	/*
+	 * Ask the opclass to produce ranges in appropriate ordering.
+	 *
+	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
+	 */
+	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
+											InvalidOid,	/* FIXME use proper collation*/
+											PointerGetDatum(scan),
+											Int16GetDatum(attno),
+											BoolGetDatum(asc)));
+
+	/* allocate for space, and also for the alternative ordering */
+	planstate->bs_scan = brscan;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitBrinSort
+ *
+ *		Initializes the index scan's state information, creates
+ *		scan keys, and opens the base and index relations.
+ *
+ *		Note: index scans have 2 sets of state information because
+ *			  we have to keep track of the base relation and the
+ *			  index relation.
+ * ----------------------------------------------------------------
+ */
+BrinSortState *
+ExecInitBrinSort(BrinSort *node, EState *estate, int eflags)
+{
+	BrinSortState *indexstate;
+	Relation	currentRelation;
+	LOCKMODE	lockmode;
+
+	/*
+	 * create state structure
+	 */
+	indexstate = makeNode(BrinSortState);
+	indexstate->ss.ps.plan = (Plan *) node;
+	indexstate->ss.ps.state = estate;
+	indexstate->ss.ps.ExecProcNode = ExecBrinSort;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &indexstate->ss.ps);
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	indexstate->ss.ss_currentRelation = currentRelation;
+	indexstate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &indexstate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTypeTL(&indexstate->ss.ps);
+	ExecAssignScanProjectionInfo(&indexstate->ss);
+
+	/*
+	 * initialize child expressions
+	 *
+	 * Note: we don't initialize all of the indexqual expression, only the
+	 * sub-parts corresponding to runtime keys (see below).  Likewise for
+	 * indexorderby, if any.  But the indexqualorig expression is always
+	 * initialized even though it will only be used in some uncommon cases ---
+	 * would be nice to improve that.  (Problem is that any SubPlans present
+	 * in the expression must be found now...)
+	 */
+	indexstate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) indexstate);
+	indexstate->indexqualorig =
+		ExecInitQual(node->indexqualorig, (PlanState *) indexstate);
+
+	/*
+	 * If we are just doing EXPLAIN (ie, aren't going to run the plan), stop
+	 * here.  This allows an index-advisor plugin to EXPLAIN a plan containing
+	 * references to nonexistent indexes.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return indexstate;
+
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+
+	/*
+	 * Initialize index-specific scan state
+	 */
+	indexstate->iss_RuntimeKeysReady = false;
+	indexstate->iss_RuntimeKeys = NULL;
+	indexstate->iss_NumRuntimeKeys = 0;
+
+	/*
+	 * build the index scan keys from the index qualification
+	 */
+	ExecIndexBuildScanKeys((PlanState *) indexstate,
+						   indexstate->iss_RelationDesc,
+						   node->indexqual,
+						   false,
+						   &indexstate->iss_ScanKeys,
+						   &indexstate->iss_NumScanKeys,
+						   &indexstate->iss_RuntimeKeys,
+						   &indexstate->iss_NumRuntimeKeys,
+						   NULL,	/* no ArrayKeys */
+						   NULL);
+
+	/*
+	 * If we have runtime keys, we need an ExprContext to evaluate them. The
+	 * node's standard context won't do because we want to reset that context
+	 * for every tuple.  So, build another context just like the other one...
+	 * -tgl 7/11/00
+	 */
+	if (indexstate->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *stdecontext = indexstate->ss.ps.ps_ExprContext;
+
+		ExecAssignExprContext(estate, &indexstate->ss.ps);
+		indexstate->iss_RuntimeContext = indexstate->ss.ps.ps_ExprContext;
+		indexstate->ss.ps.ps_ExprContext = stdecontext;
+	}
+	else
+	{
+		indexstate->iss_RuntimeContext = NULL;
+	}
+
+	indexstate->bs_tuplesortstate = NULL;
+	indexstate->bs_qual = indexstate->ss.ps.qual;
+	indexstate->ss.ps.qual = NULL;
+	ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+
+	/*
+	 * all done.
+	 */
+	return indexstate;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortEstimate(BrinSortState *node,
+					  ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+
+	node->iss_PscanLen = index_parallelscan_estimate(node->iss_RelationDesc,
+													 estate->es_snapshot);
+	shm_toc_estimate_chunk(&pcxt->estimator, node->iss_PscanLen);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeDSM
+ *
+ *		Set up a parallel index scan descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeDSM(BrinSortState *node,
+						   ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_allocate(pcxt->toc, node->iss_PscanLen);
+	index_parallelscan_initialize(node->ss.ss_currentRelation,
+								  node->iss_RelationDesc,
+								  estate->es_snapshot,
+								  piscan);
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, piscan);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortReInitializeDSM(BrinSortState *node,
+							 ParallelContext *pcxt)
+{
+	index_parallelrescan(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeWorker(BrinSortState *node,
+							  ParallelWorkerContext *pwcxt)
+{
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 4c6b1d1f55b..64d103b19e9 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -790,6 +790,260 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	path->path.total_cost = startup_cost + run_cost;
 }
 
+void
+cost_brinsort(BrinSortPath *path, PlannerInfo *root, double loop_count,
+		   bool partial_path)
+{
+	IndexOptInfo *index = path->ipath.indexinfo;
+	RelOptInfo *baserel = index->rel;
+	amcostestimate_function amcostestimate;
+	List	   *qpquals;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_run_cost = 0;
+	Cost		indexStartupCost;
+	Cost		indexTotalCost;
+	Selectivity indexSelectivity;
+	double		indexCorrelation,
+				csquared;
+	double		spc_seq_page_cost,
+				spc_random_page_cost;
+	Cost		min_IO_cost,
+				max_IO_cost;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	double		tuples_fetched;
+	double		pages_fetched;
+	double		rand_heap_pages;
+	double		index_pages;
+
+	/* Should only be applied to base relations */
+	Assert(IsA(baserel, RelOptInfo) &&
+		   IsA(index, IndexOptInfo));
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/*
+	 * Mark the path with the correct row estimate, and identify which quals
+	 * will need to be enforced as qpquals.  We need not check any quals that
+	 * are implied by the index's predicate, so we can use indrestrictinfo not
+	 * baserestrictinfo as the list of relevant restriction clauses for the
+	 * rel.
+	 */
+	if (path->ipath.path.param_info)
+	{
+		path->ipath.path.rows = path->ipath.path.param_info->ppi_rows;
+		/* qpquals come from the rel's restriction clauses and ppi_clauses */
+		qpquals = list_concat(extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+														  path->ipath.indexclauses),
+							  extract_nonindex_conditions(path->ipath.path.param_info->ppi_clauses,
+														  path->ipath.indexclauses));
+	}
+	else
+	{
+		path->ipath.path.rows = baserel->rows;
+		/* qpquals come from just the rel's restriction clauses */
+		qpquals = extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+											  path->ipath.indexclauses);
+	}
+
+	if (!enable_indexscan)
+		startup_cost += disable_cost;
+	/* we don't need to check enable_indexonlyscan; indxpath.c does that */
+
+	/*
+	 * Call index-access-method-specific code to estimate the processing cost
+	 * for scanning the index, as well as the selectivity of the index (ie,
+	 * the fraction of main-table tuples we will have to retrieve) and its
+	 * correlation to the main-table tuple order.  We need a cast here because
+	 * pathnodes.h uses a weak function type to avoid including amapi.h.
+	 */
+	amcostestimate = (amcostestimate_function) index->amcostestimate;
+	amcostestimate(root, &path->ipath, loop_count,
+				   &indexStartupCost, &indexTotalCost,
+				   &indexSelectivity, &indexCorrelation,
+				   &index_pages);
+
+	/*
+	 * Save amcostestimate's results for possible use in bitmap scan planning.
+	 * We don't bother to save indexStartupCost or indexCorrelation, because a
+	 * bitmap scan doesn't care about either.
+	 */
+	path->ipath.indextotalcost = indexTotalCost;
+	path->ipath.indexselectivity = indexSelectivity;
+
+	/* all costs for touching index itself included here */
+	startup_cost += indexStartupCost;
+	run_cost += indexTotalCost - indexStartupCost;
+
+	/* estimate number of main-table tuples fetched */
+	tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);
+
+	/* fetch estimated page costs for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/*----------
+	 * Estimate number of main-table pages fetched, and compute I/O cost.
+	 *
+	 * When the index ordering is uncorrelated with the table ordering,
+	 * we use an approximation proposed by Mackert and Lohman (see
+	 * index_pages_fetched() for details) to compute the number of pages
+	 * fetched, and then charge spc_random_page_cost per page fetched.
+	 *
+	 * When the index ordering is exactly correlated with the table ordering
+	 * (just after a CLUSTER, for example), the number of pages fetched should
+	 * be exactly selectivity * table_size.  What's more, all but the first
+	 * will be sequential fetches, not the random fetches that occur in the
+	 * uncorrelated case.  So if the number of pages is more than 1, we
+	 * ought to charge
+	 *		spc_random_page_cost + (pages_fetched - 1) * spc_seq_page_cost
+	 * For partially-correlated indexes, we ought to charge somewhere between
+	 * these two estimates.  We currently interpolate linearly between the
+	 * estimates based on the correlation squared (XXX is that appropriate?).
+	 *
+	 * If it's an index-only scan, then we will not need to fetch any heap
+	 * pages for which the visibility map shows all tuples are visible.
+	 * Hence, reduce the estimated number of heap fetches accordingly.
+	 * We use the measured fraction of the entire heap that is all-visible,
+	 * which might not be particularly relevant to the subset of the heap
+	 * that this query will fetch; but it's not clear how to do better.
+	 *----------
+	 */
+	if (loop_count > 1)
+	{
+		/*
+		 * For repeated indexscans, the appropriate estimate for the
+		 * uncorrelated case is to scale up the number of tuples fetched in
+		 * the Mackert and Lohman formula by the number of scans, so that we
+		 * estimate the number of pages fetched by all the scans; then
+		 * pro-rate the costs for one scan.  In this case we assume all the
+		 * fetches are random accesses.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+
+		/*
+		 * In the perfectly correlated case, the number of pages touched by
+		 * each scan is selectivity * table_size, and we can use the Mackert
+		 * and Lohman formula at the page level to estimate how much work is
+		 * saved by caching across scans.  We still assume all the fetches are
+		 * random, though, which is an overestimate that's hard to correct for
+		 * without double-counting the cache effects.  (But in most cases
+		 * where such a plan is actually interesting, only one page would get
+		 * fetched per scan anyway, so it shouldn't matter much.)
+		 */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		pages_fetched = index_pages_fetched(pages_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		min_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+	}
+	else
+	{
+		/*
+		 * Normal case: apply the Mackert and Lohman formula, and then
+		 * interpolate between that and the correlation-derived result.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		/* max_IO_cost is for the perfectly uncorrelated case (csquared=0) */
+		max_IO_cost = pages_fetched * spc_random_page_cost;
+
+		/* min_IO_cost is for the perfectly correlated case (csquared=1) */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		if (pages_fetched > 0)
+		{
+			min_IO_cost = spc_random_page_cost;
+			if (pages_fetched > 1)
+				min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
+		}
+		else
+			min_IO_cost = 0;
+	}
+
+	if (partial_path)
+	{
+		/*
+		 * Estimate the number of parallel workers required to scan index. Use
+		 * the number of heap pages computed considering heap fetches won't be
+		 * sequential as for parallel scans the pages are accessed in random
+		 * order.
+		 */
+		path->ipath.path.parallel_workers = compute_parallel_worker(baserel,
+															  rand_heap_pages,
+															  index_pages,
+															  max_parallel_workers_per_gather);
+
+		/*
+		 * Fall out if workers can't be assigned for parallel scan, because in
+		 * such a case this path will be rejected.  So there is no benefit in
+		 * doing extra computation.
+		 */
+		if (path->ipath.path.parallel_workers <= 0)
+			return;
+
+		path->ipath.path.parallel_aware = true;
+	}
+
+	/*
+	 * Now interpolate based on estimated index order correlation to get total
+	 * disk I/O cost for main table accesses.
+	 */
+	csquared = indexCorrelation * indexCorrelation;
+
+	run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
+
+	/*
+	 * Estimate CPU costs per tuple.
+	 *
+	 * What we want here is cpu_tuple_cost plus the evaluation costs of any
+	 * qual clauses that we have to evaluate as qpquals.
+	 */
+	cost_qual_eval(&qpqual_cost, qpquals, root);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+
+	cpu_run_cost += cpu_per_tuple * tuples_fetched;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->ipath.path.pathtarget->cost.startup;
+	cpu_run_cost += path->ipath.path.pathtarget->cost.per_tuple * path->ipath.path.rows;
+
+	/* Adjust costing for parallelism, if used. */
+	if (path->ipath.path.parallel_workers > 0)
+	{
+		double		parallel_divisor = get_parallel_divisor(&path->ipath.path);
+
+		path->ipath.path.rows = clamp_row_est(path->ipath.path.rows / parallel_divisor);
+
+		/* The CPU cost is divided among all the workers. */
+		cpu_run_cost /= parallel_divisor;
+	}
+
+	run_cost += cpu_run_cost;
+
+	path->ipath.path.startup_cost = startup_cost;
+	path->ipath.path.total_cost = startup_cost + run_cost;
+}
+
 /*
  * extract_nonindex_conditions
  *
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c31fcc917df..18b625460eb 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -17,12 +17,16 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
+#include "access/relation.h"
 #include "access/stratnum.h"
 #include "access/sysattr.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
@@ -32,10 +36,13 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/rel.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
 
+bool		enable_brinsort = true;
+
 /* XXX see PartCollMatchesExprColl */
 #define IndexCollMatchesExprColl(idxcollation, exprcollation) \
 	((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
@@ -1127,6 +1134,185 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	/*
+	 * If this is a BRIN index with suitable opclass (minmax or such), we may
+	 * try doing BRIN sort. BRIN indexes are not ordered and amcanorderbyop
+	 * is set to false, so we probably will need some new opclass flag to
+	 * mark indexes that support this.
+	 */
+	if (enable_brinsort && pathkeys_possibly_useful)
+	{
+		ListCell *lc;
+		Relation rel2 = relation_open(index->indexoid, NoLock);
+		int		 idx;
+
+		/*
+		 * Try generating sorted paths for each key with the right opclass.
+		 */
+		idx = -1;
+		foreach(lc, index->indextlist)
+		{
+			TargetEntry	   *indextle = (TargetEntry *) lfirst(lc);
+			BrinSortPath   *bpath;
+			Oid				rangeproc;
+			AttrNumber		attnum;
+
+			idx++;
+			attnum = (idx + 1);
+
+			/* skip expressions for now */
+			if (!AttributeNumberIsValid(index->indexkeys[idx]))
+				continue;
+
+			/* XXX ignore non-BRIN indexes */
+			if (rel2->rd_rel->relam != BRIN_AM_OID)
+				continue;
+
+			/*
+			 * XXX Ignore keys not using an opclass with the "ranges" proc.
+			 * For now we only do this for some minmax opclasses, but adding
+			 * it to all minmax is simple, and adding it to minmax-multi
+			 * should not be very hard.
+			 */
+			rangeproc = index_getprocid(rel2, attnum, BRIN_PROCNUM_RANGES);
+			if (!OidIsValid(rangeproc))
+				continue;
+
+			/*
+			 * XXX stuff extracted from build_index_pathkeys, except that we
+			 * only deal with a single index key (producing a single pathkey),
+			 * so we only sort on a single column. I guess we could use more
+			 * index keys and sort on more expressions? Would that mean these
+			 * keys need to be rather well correlated? In any case, it seems
+			 * rather complex to implement, so I leave it as a possible
+			 * future improvement.
+			 *
+			 * XXX This could also use the other BRIN keys (even from other
+			 * indexes) in a different way - we might use the other ranges
+			 * to quickly eliminate some of the chunks, essentially like a
+			 * bitmap, but maybe without using the bitmap. Or we might use
+			 * other indexes through bitmaps.
+			 *
+			 * XXX This fakes a number of parameters, because we don't store
+			 * the btree opclass in the index, instead we use the default
+			 * one for the key data type. And BRIN does not allow specifying
+			 *
+			 * XXX We don't add the path to result, because this function is
+			 * supposed to generate IndexPaths. Instead, we just add the path
+			 * using add_path(). We should be building this in a different
+			 * place, perhaps in create_index_paths() or so.
+			 *
+			 * XXX By building it elsewhere, we could also leverage the index
+			 * paths we've built here, particularly the bitmap index paths,
+			 * which we could use to eliminate many of the ranges.
+			 *
+			 * XXX We don't have any explicit ordering associated with the
+			 * BRIN index, e.g. we don't have ASC/DESC and NULLS FIRST/LAST.
+			 * So this is not encoded in the index, and we can satisfy all
+			 * these cases - but we need to add paths for each combination.
+			 * I wonder if there's a better way to do this.
+			 */
+
+			/* ASC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* ASC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+		}
+
+		relation_close(rel2, NoLock);
+	}
+
 	return result;
 }
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index a9943cd6e01..83dde6f22eb 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -27,6 +27,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/lsyscache.h"
+#include "utils/typcache.h"
 
 
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
@@ -630,6 +631,55 @@ build_index_pathkeys(PlannerInfo *root,
 	return retval;
 }
 
+
+List *
+build_index_pathkeys_brin(PlannerInfo *root,
+						  IndexOptInfo *index,
+						  TargetEntry  *tle,
+						  int idx,
+						  bool reverse_sort,
+						  bool nulls_first)
+{
+	TypeCacheEntry *typcache;
+	PathKey		   *cpathkey;
+	Oid				sortopfamily;
+
+	/*
+	 * Get default btree opfamily for the type, extracted from the
+	 * entry in index targetlist.
+	 *
+	 * XXX Is there a better / more correct way to do this?
+	 */
+	typcache = lookup_type_cache(exprType((Node *) tle->expr),
+								 TYPECACHE_BTREE_OPFAMILY);
+	sortopfamily = typcache->btree_opf;
+
+	/*
+	 * OK, try to make a canonical pathkey for this sort key.  Note we're
+	 * underneath any outer joins, so nullable_relids should be NULL.
+	 */
+	cpathkey = make_pathkey_from_sortinfo(root,
+										  tle->expr,
+										  NULL,
+										  sortopfamily,
+										  index->opcintype[idx],
+										  index->indexcollations[idx],
+										  reverse_sort,
+										  nulls_first,
+										  0,
+										  index->rel->relids,
+										  false);
+
+	/*
+	 * There may be no pathkey if we haven't matched any sortkey, in which
+	 * case ignore it.
+	 */
+	if (!cpathkey)
+		return NIL;
+
+	return list_make1(cpathkey);
+}
+
 /*
  * partkey_is_bool_constant_for_query
  *
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ac86ce90033..395c632f430 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -124,6 +124,8 @@ static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
 										  List *tlist, List *scan_clauses);
 static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
 								   List *tlist, List *scan_clauses, bool indexonly);
+static BrinSort *create_brinsort_plan(PlannerInfo *root, BrinSortPath *best_path,
+									  List *tlist, List *scan_clauses);
 static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
 											   BitmapHeapPath *best_path,
 											   List *tlist, List *scan_clauses);
@@ -191,6 +193,9 @@ static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 List *indexorderby,
 										 List *indextlist,
 										 ScanDirection indexscandir);
+static BrinSort *make_brinsort(List *qptlist, List *qpqual, Index scanrelid,
+							   Oid indexid, List *indexqual, List *indexqualorig,
+							   ScanDirection indexscandir);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -410,6 +415,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_CustomScan:
 			plan = create_scan_plan(root, best_path, flags);
 			break;
+		case T_BrinSort:
+			plan = create_scan_plan(root, best_path, flags);
+			break;
 		case T_HashJoin:
 		case T_MergeJoin:
 		case T_NestLoop:
@@ -776,6 +784,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												   scan_clauses);
 			break;
 
+		case T_BrinSort:
+			plan = (Plan *) create_brinsort_plan(root,
+												 (BrinSortPath *) best_path,
+												 tlist,
+												 scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -3180,6 +3195,154 @@ create_indexscan_plan(PlannerInfo *root,
 	return scan_plan;
 }
 
+/*
+ * create_brinsort_plan
+ *	  Returns a brinsort plan for the base relation scanned by 'best_path'
+ *	  with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ *
+ * This is mostly a slighly simplified version of create_indexscan_plan, with
+ * the unecessary parts removed (we don't support indexonly scans, or reordering
+ * and similar stuff).
+ */
+static BrinSort *
+create_brinsort_plan(PlannerInfo *root,
+					 BrinSortPath *best_path,
+					 List *tlist,
+					 List *scan_clauses)
+{
+	BrinSort   *brinsort_plan;
+	List	   *indexclauses = best_path->ipath.indexclauses;
+	Index		baserelid = best_path->ipath.path.parent->relid;
+	IndexOptInfo *indexinfo = best_path->ipath.indexinfo;
+	Oid			indexoid = indexinfo->indexoid;
+	List	   *qpqual;
+	List	   *stripped_indexquals;
+	List	   *fixed_indexquals;
+	ListCell   *l;
+
+	List	   *pathkeys = best_path->ipath.path.pathkeys;
+
+	/* it should be a base rel... */
+	Assert(baserelid > 0);
+	Assert(best_path->ipath.path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * Extract the index qual expressions (stripped of RestrictInfos) from the
+	 * IndexClauses list, and prepare a copy with index Vars substituted for
+	 * table Vars.  (This step also does replace_nestloop_params on the
+	 * fixed_indexquals.)
+	 */
+	fix_indexqual_references(root, &best_path->ipath,
+							 &stripped_indexquals,
+							 &fixed_indexquals);
+
+	/*
+	 * The qpqual list must contain all restrictions not automatically handled
+	 * by the index, other than pseudoconstant clauses which will be handled
+	 * by a separate gating plan node.  All the predicates in the indexquals
+	 * will be checked (either by the index itself, or by nodeIndexscan.c),
+	 * but if there are any "special" operators involved then they must be
+	 * included in qpqual.  The upshot is that qpqual must contain
+	 * scan_clauses minus whatever appears in indexquals.
+	 *
+	 * is_redundant_with_indexclauses() detects cases where a scan clause is
+	 * present in the indexclauses list or is generated from the same
+	 * EquivalenceClass as some indexclause, and is therefore redundant with
+	 * it, though not equal.  (The latter happens when indxpath.c prefers a
+	 * different derived equality than what generate_join_implied_equalities
+	 * picked for a parameterized scan's ppi_clauses.)  Note that it will not
+	 * match to lossy index clauses, which is critical because we have to
+	 * include the original clause in qpqual in that case.
+	 *
+	 * In some situations (particularly with OR'd index conditions) we may
+	 * have scan_clauses that are not equal to, but are logically implied by,
+	 * the index quals; so we also try a predicate_implied_by() check to see
+	 * if we can discard quals that way.  (predicate_implied_by assumes its
+	 * first input contains only immutable functions, so we have to check
+	 * that.)
+	 *
+	 * Note: if you change this bit of code you should also look at
+	 * extract_nonindex_conditions() in costsize.c.
+	 */
+	qpqual = NIL;
+	foreach(l, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (rinfo->pseudoconstant)
+			continue;			/* we may drop pseudoconstants here */
+		if (is_redundant_with_indexclauses(rinfo, indexclauses))
+			continue;			/* dup or derived from same EquivalenceClass */
+		if (!contain_mutable_functions((Node *) rinfo->clause) &&
+			predicate_implied_by(list_make1(rinfo->clause), stripped_indexquals,
+								 false))
+			continue;			/* provably implied by indexquals */
+		qpqual = lappend(qpqual, rinfo);
+	}
+
+	/* Sort clauses into best execution order */
+	qpqual = order_qual_clauses(root, qpqual);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	qpqual = extract_actual_clauses(qpqual, false);
+
+	/*
+	 * We have to replace any outer-relation variables with nestloop params in
+	 * the indexqualorig, qpqual, and indexorderbyorig expressions.  A bit
+	 * annoying to have to do this separately from the processing in
+	 * fix_indexqual_references --- rethink this when generalizing the inner
+	 * indexscan support.  But note we can't really do this earlier because
+	 * it'd break the comparisons to predicates above ... (or would it?  Those
+	 * wouldn't have outer refs)
+	 */
+	if (best_path->ipath.path.param_info)
+	{
+		stripped_indexquals = (List *)
+			replace_nestloop_params(root, (Node *) stripped_indexquals);
+		qpqual = (List *)
+			replace_nestloop_params(root, (Node *) qpqual);
+	}
+
+	/* Finally ready to build the plan node */
+	brinsort_plan = make_brinsort(tlist,
+								  qpqual,
+								  baserelid,
+								  indexoid,
+								  fixed_indexquals,
+								  stripped_indexquals,
+								  best_path->ipath.indexscandir);
+
+	if (pathkeys != NIL)
+	{
+		/*
+		 * Compute sort column info, and adjust the Append's tlist as needed.
+		 * Because we pass adjust_tlist_in_place = true, we may ignore the
+		 * function result; it must be the same plan node.  However, we then
+		 * need to detect whether any tlist entries were added.
+		 */
+		(void) prepare_sort_from_pathkeys((Plan *) brinsort_plan, pathkeys,
+										  best_path->ipath.path.parent->relids,
+										  NULL,
+										  true,
+										  &brinsort_plan->numCols,
+										  &brinsort_plan->sortColIdx,
+										  &brinsort_plan->sortOperators,
+										  &brinsort_plan->collations,
+										  &brinsort_plan->nullsFirst);
+		//tlist_was_changed = (orig_tlist_length != list_length(plan->plan.targetlist));
+		for (int i = 0; i < brinsort_plan->numCols; i++)
+			elog(DEBUG1, "%d => %d %d %d %d", i,
+				 brinsort_plan->sortColIdx[i],
+				 brinsort_plan->sortOperators[i],
+				 brinsort_plan->collations[i],
+				 brinsort_plan->nullsFirst[i]);
+	}
+
+	copy_generic_path_info(&brinsort_plan->scan.plan, &best_path->ipath.path);
+
+	return brinsort_plan;
+}
+
 /*
  * create_bitmap_scan_plan
  *	  Returns a bitmap scan plan for the base relation scanned by 'best_path'
@@ -5523,6 +5686,31 @@ make_indexscan(List *qptlist,
 	return node;
 }
 
+static BrinSort *
+make_brinsort(List *qptlist,
+			   List *qpqual,
+			   Index scanrelid,
+			   Oid indexid,
+			   List *indexqual,
+			   List *indexqualorig,
+			   ScanDirection indexscandir)
+{
+	BrinSort  *node = makeNode(BrinSort);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->indexid = indexid;
+	node->indexqual = indexqual;
+	node->indexqualorig = indexqualorig;
+	node->indexorderdir = indexscandir;
+
+	return node;
+}
+
 static IndexOnlyScan *
 make_indexonlyscan(List *qptlist,
 				   List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1f..2584a1f032d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -609,6 +609,25 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 				return set_indexonlyscan_references(root, splan, rtoffset);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *splan = (BrinSort *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->indexqual =
+					fix_scan_list(root, splan->indexqual,
+								  rtoffset, 1);
+				splan->indexqualorig =
+					fix_scan_list(root, splan->indexqualorig,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *splan = (BitmapIndexScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 70f61ae7b1c..6471bbb5de8 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1030,6 +1030,63 @@ create_index_path(PlannerInfo *root,
 	return pathnode;
 }
 
+
+/*
+ * create_brinsort_path
+ *	  Creates a path node for sorted brin sort scan.
+ *
+ * 'index' is a usable index.
+ * 'indexclauses' is a list of IndexClause nodes representing clauses
+ *			to be enforced as qual conditions in the scan.
+ * 'indexorderbys' is a list of bare expressions (no RestrictInfos)
+ *			to be used as index ordering operators in the scan.
+ * 'indexorderbycols' is an integer list of index column numbers (zero based)
+ *			the ordering operators can be used with.
+ * 'pathkeys' describes the ordering of the path.
+ * 'indexscandir' is ForwardScanDirection or BackwardScanDirection
+ *			for an ordered index, or NoMovementScanDirection for
+ *			an unordered index.
+ * 'indexonly' is true if an index-only scan is wanted.
+ * 'required_outer' is the set of outer relids for a parameterized path.
+ * 'loop_count' is the number of repetitions of the indexscan to factor into
+ *		estimates of caching behavior.
+ * 'partial_path' is true if constructing a parallel index scan path.
+ *
+ * Returns the new path node.
+ */
+BrinSortPath *
+create_brinsort_path(PlannerInfo *root,
+					 IndexOptInfo *index,
+					 List *indexclauses,
+					 List *pathkeys,
+					 ScanDirection indexscandir,
+					 bool indexonly,
+					 Relids required_outer,
+					 double loop_count,
+					 bool partial_path)
+{
+	BrinSortPath  *pathnode = makeNode(BrinSortPath);
+	RelOptInfo *rel = index->rel;
+
+	pathnode->ipath.path.pathtype = T_BrinSort;
+	pathnode->ipath.path.parent = rel;
+	pathnode->ipath.path.pathtarget = rel->reltarget;
+	pathnode->ipath.path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->ipath.path.parallel_aware = false;
+	pathnode->ipath.path.parallel_safe = rel->consider_parallel;
+	pathnode->ipath.path.parallel_workers = 0;
+	pathnode->ipath.path.pathkeys = pathkeys;
+
+	pathnode->ipath.indexinfo = index;
+	pathnode->ipath.indexclauses = indexclauses;
+	pathnode->ipath.indexscandir = indexscandir;
+
+	cost_brinsort(pathnode, root, loop_count, partial_path);
+
+	return pathnode;
+}
+
 /*
  * create_bitmap_heap_path
  *	  Creates a path node for a bitmap scan.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 06dfeb6cd8b..a5ca3bd0cc4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -977,6 +977,16 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_brinsort", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of BRIN sort plans."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_brinsort,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index a7cccac9c90..be05586ec57 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,41 +34,6 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
-/*
- * Info about ranges for BRIN Sort.
- */
-typedef struct BrinRange
-{
-	BlockNumber blkno_start;
-	BlockNumber blkno_end;
-
-	Datum	min_value;
-	Datum	max_value;
-	bool	has_nulls;
-	bool	all_nulls;
-	bool	not_summarized;
-
-	/*
-	 * Index of the range when ordered by min_value (if there are multiple
-	 * ranges with the same min_value, it's the lowest one).
-	 */
-	uint32	min_index;
-
-	/*
-	 * Minimum min_index from all ranges with higher max_value (i.e. when
-	 * sorted by max_value). If there are multiple ranges with the same
-	 * max_value, it depends on the ordering (i.e. the ranges may get
-	 * different min_index_lowest, depending on the exact ordering).
-	 */
-	uint32	min_index_lowest;
-} BrinRange;
-
-typedef struct BrinRanges
-{
-	int			nranges;
-	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
-} BrinRanges;
-
 typedef struct BrinMinmaxStats
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index ee6c6f9b709..fcdd4cafda8 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -74,6 +74,7 @@ typedef struct BrinDesc
 #define BRIN_MANDATORY_NPROCS		4
 #define BRIN_PROCNUM_OPTIONS 		5	/* optional */
 #define BRIN_PROCNUM_STATISTICS		6	/* optional */
+#define BRIN_PROCNUM_RANGES 		7	/* optional */
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index ea3de9bcba1..562f481af18 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -806,6 +806,8 @@
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -839,6 +841,8 @@
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -870,6 +874,8 @@
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -901,6 +907,8 @@
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -915,6 +923,8 @@
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -929,6 +939,8 @@
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1048,6 +1060,8 @@
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1078,6 +1092,8 @@
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1128,6 +1144,8 @@
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1181,6 +1199,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '6',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1197,6 +1218,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '6',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1288,6 +1312,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '6',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1344,6 +1371,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '6',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1398,6 +1428,8 @@
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1471,6 +1503,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '6',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1504,6 +1539,8 @@
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1557,6 +1594,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '6',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1573,6 +1613,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '6',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1587,6 +1630,8 @@
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1716,6 +1761,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '6',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1772,6 +1820,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '6',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1824,6 +1875,8 @@
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1841,6 +1894,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '6',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1858,6 +1914,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '6',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1912,6 +1971,8 @@
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '6', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1988,6 +2049,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '6',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1dd9177b01c..18e0824a08e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8411,6 +8411,9 @@
   proname => 'brin_minmax_stats', prorettype => 'bool',
   proargtypes => 'internal internal int2 int2 internal int4',
   prosrc => 'brin_minmax_stats' },
+{ oid => '9980', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_ranges', prorettype => 'bool',
+  proargtypes => 'internal int2 bool', prosrc => 'brin_minmax_ranges' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/executor/nodeBrinSort.h b/src/include/executor/nodeBrinSort.h
new file mode 100644
index 00000000000..2c860d926ea
--- /dev/null
+++ b/src/include/executor/nodeBrinSort.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeBrinSort.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODEBrinSort_H
+#define NODEBrinSort_H
+
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "nodes/execnodes.h"
+
+extern BrinSortState *ExecInitBrinSort(BrinSort *node, EState *estate, int eflags);
+extern void ExecEndBrinSort(BrinSortState *node);
+extern void ExecBrinSortMarkPos(BrinSortState *node);
+extern void ExecBrinSortRestrPos(BrinSortState *node);
+extern void ExecReScanBrinSort(BrinSortState *node);
+extern void ExecBrinSortEstimate(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortReInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeWorker(BrinSortState *node,
+										  ParallelWorkerContext *pwcxt);
+
+/*
+ * These routines are exported to share code with nodeIndexonlyscan.c and
+ * nodeBitmapBrinSort.c
+ */
+extern void ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
+								   List *quals, bool isorderby,
+								   ScanKey *scanKeys, int *numScanKeys,
+								   IndexRuntimeKeyInfo **runtimeKeys, int *numRuntimeKeys,
+								   IndexArrayKeyInfo **arrayKeys, int *numArrayKeys);
+extern void ExecIndexEvalRuntimeKeys(ExprContext *econtext,
+									 IndexRuntimeKeyInfo *runtimeKeys, int numRuntimeKeys);
+extern bool ExecIndexEvalArrayKeys(ExprContext *econtext,
+								   IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+extern bool ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+
+#endif							/* NODEBrinSort_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc09..381c2fcd3d6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1549,6 +1549,109 @@ typedef struct IndexScanState
 	Size		iss_PscanLen;
 } IndexScanState;
 
+typedef enum {
+	BRINSORT_START,
+	BRINSORT_LOAD_RANGE,
+	BRINSORT_PROCESS_RANGE,
+	BRINSORT_LOAD_NULLS,
+	BRINSORT_PROCESS_NULLS,
+	BRINSORT_FINISHED
+} BrinSortPhase;
+
+typedef struct BrinRangeScanDesc
+{
+	/* range info tuple descriptor */
+	TupleDesc		tdesc;
+
+	/* ranges, sorted by minval, blkno_start */
+	Tuplesortstate *ranges;
+
+	/* distinct minval (sorted) */
+	Tuplestorestate *minvals;
+
+	/* slot for accessing the tuplesort/tuplestore */
+	TupleTableSlot  *slot;
+
+} BrinRangeScanDesc;
+
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinSortState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	ExprState  *indexqualorig;
+	List	   *indexorderbyorig;
+	struct ScanKeyData *iss_ScanKeys;
+	int			iss_NumScanKeys;
+	struct ScanKeyData *iss_OrderByKeys;
+	int			iss_NumOrderByKeys;
+	IndexRuntimeKeyInfo *iss_RuntimeKeys;
+	int			iss_NumRuntimeKeys;
+	bool		iss_RuntimeKeysReady;
+	ExprContext *iss_RuntimeContext;
+	Relation	iss_RelationDesc;
+	struct IndexScanDescData *iss_ScanDesc;
+
+	/* These are needed for re-checking ORDER BY expr ordering */
+	pairingheap *iss_ReorderQueue;
+	bool		iss_ReachedEnd;
+	Datum	   *iss_OrderByValues;
+	bool	   *iss_OrderByNulls;
+	SortSupport iss_SortSupport;
+	bool	   *iss_OrderByTypByVals;
+	int16	   *iss_OrderByTypLens;
+	Size		iss_PscanLen;
+
+	/* */
+	BrinRangeScanDesc *bs_scan;
+	BrinRange	   *bs_range;
+	ExprState	   *bs_qual;
+	Datum			bs_watermark;
+	bool			bs_watermark_set;
+	BrinSortPhase	bs_phase;
+	SortSupportData	bs_sortsupport;
+
+	/*
+	 * We need two tuplesort instances - one for current range, one for
+	 * spill-over tuples from the overlapping ranges
+	 */
+	void		   *bs_tuplesortstate;
+	Tuplestorestate *bs_tuplestore;
+} BrinSortState;
+
 /* ----------------
  *	 IndexOnlyScanState information
  *
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6bda383bead..e79c904a8fc 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1596,6 +1596,17 @@ typedef struct IndexPath
 	Selectivity indexselectivity;
 } IndexPath;
 
+/*
+ * read sorted data from brin index
+ *
+ * We use IndexPath, because that's what amcostestimate is expecting, but
+ * we typedef it as a separate struct.
+ */
+typedef struct BrinSortPath
+{
+	IndexPath	ipath;
+} BrinSortPath;
+
 /*
  * Each IndexClause references a RestrictInfo node from the query's WHERE
  * or JOIN conditions, and shows how that restriction can be applied to
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21e642a64c4..c4ef5362acc 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -495,6 +495,32 @@ typedef struct IndexOnlyScan
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
 } IndexOnlyScan;
 
+
+typedef struct BrinSort
+{
+	Scan		scan;
+	Oid			indexid;		/* OID of index to scan */
+	List	   *indexqual;		/* list of index quals (usually OpExprs) */
+	List	   *indexqualorig;	/* the same in original form */
+	ScanDirection indexorderdir;	/* forward or backward or don't care */
+
+	/* number of sort-key columns */
+	int			numCols;
+
+	/* their indexes in the target list */
+	AttrNumber *sortColIdx pg_node_attr(array_size(numCols));
+
+	/* OIDs of operators to sort them by */
+	Oid		   *sortOperators pg_node_attr(array_size(numCols));
+
+	/* OIDs of collations */
+	Oid		   *collations pg_node_attr(array_size(numCols));
+
+	/* NULLS FIRST/LAST directions */
+	bool	   *nullsFirst pg_node_attr(array_size(numCols));
+
+} BrinSort;
+
 /* ----------------
  *		bitmap index scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 204e94b6d10..b77440728d1 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -69,6 +69,7 @@ extern PGDLLIMPORT bool enable_parallel_append;
 extern PGDLLIMPORT bool enable_parallel_hash;
 extern PGDLLIMPORT bool enable_partition_pruning;
 extern PGDLLIMPORT bool enable_async_append;
+extern PGDLLIMPORT bool enable_brinsort;
 extern PGDLLIMPORT int constraint_exclusion;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
@@ -79,6 +80,8 @@ extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 							ParamPathInfo *param_info);
 extern void cost_index(IndexPath *path, PlannerInfo *root,
 					   double loop_count, bool partial_path);
+extern void cost_brinsort(BrinSortPath *path, PlannerInfo *root,
+						  double loop_count, bool partial_path);
 extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 								  ParamPathInfo *param_info,
 								  Path *bitmapqual, double loop_count);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 050f00e79a4..11caad3ec51 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -49,6 +49,15 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									Relids required_outer,
 									double loop_count,
 									bool partial_path);
+extern BrinSortPath *create_brinsort_path(PlannerInfo *root,
+									IndexOptInfo *index,
+									List *indexclauses,
+									List *pathkeys,
+									ScanDirection indexscandir,
+									bool indexonly,
+									Relids required_outer,
+									double loop_count,
+									bool partial_path);
 extern BitmapHeapPath *create_bitmap_heap_path(PlannerInfo *root,
 											   RelOptInfo *rel,
 											   Path *bitmapqual,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 41f765d3422..6aa50257730 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -213,6 +213,9 @@ extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,
 extern Path *get_cheapest_parallel_safe_total_inner(List *paths);
 extern List *build_index_pathkeys(PlannerInfo *root, IndexOptInfo *index,
 								  ScanDirection scandir);
+extern List *build_index_pathkeys_brin(PlannerInfo *root, IndexOptInfo *index,
+								  TargetEntry *tle, int idx,
+								  bool reverse_sort, bool nulls_first);
 extern List *build_partition_pathkeys(PlannerInfo *root, RelOptInfo *partrel,
 									  ScanDirection scandir, bool *partialkeys);
 extern List *build_expression_pathkey(PlannerInfo *root, Expr *expr,
-- 
2.37.3

0003-wip-brinsort-explain-stats-20221022.patchtext/x-patch; charset=UTF-8; name=0003-wip-brinsort-explain-stats-20221022.patchDownload

From 49d9058eab1d4009de8e82eb1a87ad49372f297b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 21 Oct 2022 15:33:16 +0200
Subject: [PATCH 3/6] wip: brinsort explain stats

Show some internal stats about BRIN Sort in EXPLAIN output.
---
 src/backend/commands/explain.c      | 115 ++++++++++++++++++++++++++++
 src/backend/executor/nodeBrinSort.c |  35 +++++++++
 src/include/nodes/execnodes.h       |  33 ++++++++
 3 files changed, 183 insertions(+)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e15b29246b1..c5ace02a10d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -87,6 +87,8 @@ static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
 static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
 							   ExplainState *es);
+static void show_brinsort_stats(BrinSortState *sortstate, List *ancestors,
+								ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1814,6 +1816,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			show_brinsort_stats(castNode(BrinSortState, planstate), ancestors, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
@@ -2432,6 +2435,118 @@ show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 						 ancestors, es);
 }
 
+static void
+show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSortStats  *stats = &sortstate->bs_stats;
+
+	if (stats->sort_count > 0)
+	{
+		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
+							   stats->range_count, es);
+
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainPropertyInteger("Sorts", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Tuples Sorted: " INT64_FORMAT "  Per-sort: " INT64_FORMAT  "  Direct: " INT64_FORMAT "  Spilled: " INT64_FORMAT "  Respilled: " INT64_FORMAT "\n",
+							 stats->ntuples_tuplesort_all,
+							 stats->ntuples_tuplesort_all / stats->sort_count,
+							 stats->ntuples_tuplesort_direct,
+							 stats->ntuples_spilled,
+							 stats->ntuples_respilled);
+		}
+		else
+		{
+			ExplainOpenGroup("Sorts", "Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainPropertyInteger("Tuples per sort", NULL, (int64)
+								   stats->ntuples_tuplesort_all / stats->sort_count, es);
+
+			ExplainPropertyInteger("Sorted tuples (all)", NULL, (int64)
+								   stats->ntuples_tuplesort_all, es);
+
+			ExplainPropertyInteger("Sorted tuples (direct)", NULL, (int64)
+								   stats->ntuples_tuplesort_direct, es);
+
+			ExplainPropertyInteger("Spilled tuples", NULL, (int64)
+								   stats->ntuples_spilled, es);
+
+			ExplainPropertyInteger("Respilled tuples", NULL, (int64)
+								   stats->ntuples_respilled, es);
+
+			ExplainCloseGroup("Sorts", "Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_in_memory > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (in-memory)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_in_memory,
+							 stats->total_space_used_in_memory,
+							 stats->max_space_used_in_memory,
+							 stats->total_space_used_in_memory / stats->sort_count_in_memory);
+		}
+		else
+		{
+			ExplainOpenGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_in_memory / stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_in_memory, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_in_memory, es);
+
+			ExplainCloseGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_on_disk > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (on-disk)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_on_disk,
+							 stats->total_space_used_on_disk,
+							 stats->max_space_used_on_disk,
+							 stats->total_space_used_on_disk / stats->sort_count_on_disk);
+		}
+		else
+		{
+			ExplainOpenGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_on_disk / stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_on_disk, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_on_disk, es);
+
+			ExplainCloseGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+		}
+	}
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index ca72c1ed22d..c7d417d6e57 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -454,6 +454,8 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
 		return;
 
+	node->bs_stats.range_count++;
+
 	brinsort_start_tidscan(node);
 
 	scan = node->ss.ss_currentScanDesc;
@@ -526,7 +528,10 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 				/* Stash it to the tuplestore (when NULL, or ignore
 				 * it (when not-NULL). */
 				if (isnull)
+				{
 					tuplestore_puttupleslot(node->bs_tuplestore, slot);
+					node->bs_stats.ntuples_spilled++;
+				}
 
 				/* NULL or not, we're done */
 				continue;
@@ -546,9 +551,16 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 										  &node->bs_sortsupport);
 
 			if (cmp <= 0)
+			{
 				tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+				node->bs_stats.ntuples_tuplesort_direct++;
+				node->bs_stats.ntuples_tuplesort_all++;
+			}
 			else
+			{
 				tuplestore_puttupleslot(node->bs_tuplestore, slot);
+				node->bs_stats.ntuples_spilled++;
+			}
 		}
 
 		ExecClearTuple(slot);
@@ -610,9 +622,15 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 									  &node->bs_sortsupport);
 
 		if (cmp <= 0)
+		{
 			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+			node->bs_stats.ntuples_tuplesort_all++;
+		}
 		else
+		{
 			tuplestore_puttupleslot(tupstore, slot);
+			node->bs_stats.ntuples_respilled++;
+		}
 	}
 
 	/*
@@ -890,12 +908,29 @@ IndexNext(BrinSortState *node)
 					if (node->bs_tuplesortstate)
 					{
 						tuplesort_performsort(node->bs_tuplesortstate);
+						node->bs_stats.sort_count++;
+
 #ifdef BRINSORT_DEBUG
 						{
 							TuplesortInstrumentation stats;
 
 							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
 
+							if (stats.spaceType == SORT_SPACE_TYPE_DISK)
+							{
+								node->bs_stats.sort_count_on_disk++;
+								node->bs_stats.total_space_used_on_disk += stats.spaceUsed;
+								node->bs_stats.max_space_used_on_disk = Max(node->bs_stats.max_space_used_on_disk,
+																			stats.spaceUsed);
+							}
+							else if (stats.spaceType == SORT_SPACE_TYPE_MEMORY)
+							{
+								node->bs_stats.sort_count_in_memory++;
+								node->bs_stats.total_space_used_in_memory += stats.spaceUsed;
+								node->bs_stats.max_space_used_in_memory = Max(node->bs_stats.max_space_used_in_memory,
+																			  stats.spaceUsed);
+							}
+
 							elog(DEBUG1, "method: %s  space: %ld kB (%s)",
 								 tuplesort_method_name(stats.sortMethod),
 								 stats.spaceUsed,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 381c2fcd3d6..e8f7b25549f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1609,6 +1609,38 @@ typedef struct BrinRanges
 	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
 } BrinRanges;
 
+typedef struct BrinSortStats
+{
+	/* number of sorts */
+	int64	sort_count;
+
+	/* number of ranges loaded */
+	int64	range_count;
+
+	/* tuples written directly to tuplesort */
+	int64	ntuples_tuplesort_direct;
+
+	/* tuples written to tuplesort (all) */
+	int64	ntuples_tuplesort_all;
+
+	/* tuples written to tuplestore */
+	int64	ntuples_spilled;
+
+	/* tuples copied from old to new tuplestore */
+	int64	ntuples_respilled;
+
+	/* number of in-memory/on-disk sorts */
+	int64	sort_count_in_memory;
+	int64	sort_count_on_disk;
+
+	/* total/maximum amount of space used by either sort */
+	int64	total_space_used_in_memory;
+	int64	total_space_used_on_disk;
+	int64	max_space_used_in_memory;
+	int64	max_space_used_on_disk;
+
+} BrinSortStats;
+
 typedef struct BrinSortState
 {
 	ScanState	ss;				/* its first field is NodeTag */
@@ -1643,6 +1675,7 @@ typedef struct BrinSortState
 	bool			bs_watermark_set;
 	BrinSortPhase	bs_phase;
 	SortSupportData	bs_sortsupport;
+	BrinSortStats	bs_stats;
 
 	/*
 	 * We need two tuplesort instances - one for current range, one for
-- 
2.37.3

0004-wip-multiple-watermark-steps-20221022.patchtext/x-patch; charset=UTF-8; name=0004-wip-multiple-watermark-steps-20221022.patchDownload

From ad9f73005b425ff15f2ff2687a23f2b024db56b2 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 20 Oct 2022 13:03:00 +0200
Subject: [PATCH 4/6] wip: multiple watermark steps

Allow incrementing the minval watermark faster, by skipping some minval
values. This allows sorting more data at once (instead of many tiny
sorts, which is inefficient). This also reduces the number of rows we
need to spill (and possibly transfer multiple times).

To use a different watermark step, use a new GUC:

  SET brinsort_watermark_step = 16
---
 src/backend/executor/nodeBrinSort.c | 59 ++++++++++++++++++++++++++---
 src/backend/utils/misc/guc_tables.c | 11 ++++++
 2 files changed, 64 insertions(+), 6 deletions(-)

diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index c7d417d6e57..3563bf3c1ad 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -257,6 +257,14 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 
 #define BRINSORT_DEBUG
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 1;
+
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -357,11 +365,24 @@ brinsort_end_tidscan(BrinSortState *node)
  * heuristics (measure past sorts and extrapolate).
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool asc)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
 {
 	int		cmp;
+
+	/* assume we haven't found a watermark */
 	bool	found = false;
 
+	Assert(steps > 0);
+
+	/*
+	 * If the watermark is not set, either this is the first call (in
+	 * which case we just use the first (or rather second) value.
+	 * Otherwise it means we've reached the end, so no point in looking
+	 * for more watermarks.
+	 */
+	if (!node->bs_watermark_set && !first)
+		return;
+
 	tuplesort_markpos(node->bs_scan->ranges);
 
 	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
@@ -387,22 +408,48 @@ brinsort_update_watermark(BrinSortState *node, bool asc)
 		else
 			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
 
+		/*
+		 * Has to be the first call (otherwise we would not get here, because we
+		 * terminate after bs_watermark_set gets flipped back to false), so we
+		 * just set the value. But we don't count this as a step, because that
+		 * just picks the first minval value, as we certainly need to do at least
+		 * one more step.
+		 *
+		 * XXX Actually, do we need to make another step? Maybe there are enough
+		 * not-summarized ranges? Although, we don't know what values are in
+		 * those, ranges, and with increasing data we might easily end up just
+		 * writing all of it into the spill tuplestore. So making one more step
+		 * seems like a better idea - we'll at lest be able to produce something
+		 * which is good for LIMIT queries.
+		 */
 		if (!node->bs_watermark_set)
 		{
+			Assert(first);
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
+			found = true;
 			continue;
 		}
 
 		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
 								  &node->bs_sortsupport);
 
-		if (cmp < 0)
+		/*
+		 * Values should not decrease (or whatever the operator says, might
+		 * be a DESC sort).
+		 */
+		Assert(cmp <= 0);
+
+		if (cmp < 0)	/* new watermark value */
 		{
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
 			found = true;
-			break;
+
+			steps--;
+
+			if (steps == 0)
+				break;
 		}
 	}
 
@@ -871,7 +918,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -981,7 +1028,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -1004,7 +1051,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, asc);
+								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index a5ca3bd0cc4..c7abdade496 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -95,6 +95,7 @@ extern char *temp_tablespaces;
 extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
 extern bool synchronize_seqscans;
+extern int	brinsort_watermark_step;
 
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
@@ -3425,6 +3426,16 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"brinsort_watermark_step", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("sets the step for brinsort watermark increments"),
+			NULL
+		},
+		&brinsort_watermark_step,
+		1, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
-- 
2.37.3

0005-wip-adjust-watermark-step-20221022.patchtext/x-patch; charset=UTF-8; name=0005-wip-adjust-watermark-step-20221022.patchDownload

From 223037eaff1a5b008be3230f713875a4b05f0453 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 00:06:28 +0200
Subject: [PATCH 5/6] wip: adjust watermark step

Look at available statistics - number of possible watermark values,
number of rows, work_mem, etc. and pick a good watermark_step value.

To calculate step using statistics, set the GUC to 0:

   SET brinsort_watermark_step = 0;
---
 src/backend/commands/explain.c          |  4 ++
 src/backend/executor/nodeBrinSort.c     | 20 +++----
 src/backend/optimizer/plan/createplan.c | 70 +++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c     |  2 +-
 src/include/nodes/execnodes.h           |  1 +
 src/include/nodes/plannodes.h           |  3 ++
 6 files changed, 86 insertions(+), 14 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c5ace02a10d..114846ebe0b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2439,6 +2439,10 @@ static void
 show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 {
 	BrinSortStats  *stats = &sortstate->bs_stats;
+	BrinSort   *plan = (BrinSort *) sortstate->ss.ps.plan;
+
+	ExplainPropertyInteger("Step", NULL, (int64)
+						   plan->watermark_step, es);
 
 	if (stats->sort_count > 0)
 	{
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 3563bf3c1ad..2f8e92753cd 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -257,14 +257,6 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 
 #define BRINSORT_DEBUG
 
-/*
- * How many distinct minval values to look forward for the next watermark?
- *
- * The smallest step we can do is 1, which means the immediately following
- * (while distinct) minval.
- */
-int brinsort_watermark_step = 1;
-
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -365,9 +357,11 @@ brinsort_end_tidscan(BrinSortState *node)
  * heuristics (measure past sorts and extrapolate).
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc)
 {
 	int		cmp;
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	int			steps = plan->watermark_step;
 
 	/* assume we haven't found a watermark */
 	bool	found = false;
@@ -918,7 +912,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, true, asc);
 				}
 
 				break;
@@ -978,7 +972,7 @@ IndexNext(BrinSortState *node)
 																			  stats.spaceUsed);
 							}
 
-							elog(DEBUG1, "method: %s  space: %ld kB (%s)",
+							elog(WARNING, "method: %s  space: %ld kB (%s)",
 								 tuplesort_method_name(stats.sortMethod),
 								 stats.spaceUsed,
 								 tuplesort_space_type_name(stats.spaceType));
@@ -1028,7 +1022,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, false, asc);
 				}
 
 				break;
@@ -1051,7 +1045,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+								brinsort_update_watermark(node, true, asc);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 395c632f430..997c272dec0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -18,6 +18,7 @@
 
 #include <math.h>
 
+#include "access/brin.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
 #include "foreign/fdwapi.h"
@@ -321,6 +322,14 @@ static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
 											 GatherMergePath *best_path);
 
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 0;
+
 /*
  * create_plan
  *	  Creates the access plan for a query by recursively processing the
@@ -3340,6 +3349,67 @@ create_brinsort_plan(PlannerInfo *root,
 
 	copy_generic_path_info(&brinsort_plan->scan.plan, &best_path->ipath.path);
 
+	/*
+	 * determine watermark step (how fast to advance)
+	 *
+	 * If the brinsort_watermark_step is set to a non-zero value, we just use
+	 * that value directly. Otherwise we pick a value using some simple
+	 * heuristics heuristics - we don't want the rows to exceed work_mem, and
+	 * we leave a bit slack (because we're adding batches of rows, not row
+	 * by row).
+	 *
+	 * This has a weakness, because it assumes we incrementally add the same
+	 * number of rows into the "sort" set - but imagine very wide overlapping
+	 * ranges (e.g. random data on the same domain). Most of them will have
+	 * about the same minval, so the sort grows only very slowly. Until the
+	 * very last range, that removes the watermark and only then do most of
+	 * the rows get to the tuplesort.
+	 *
+	 * XXX But maybe we can look at the other statistics we have, like number
+	 * of overlaps and average range selectivity (% of tuples matching), and
+	 * deduce something from that?
+	 *
+	 * XXX Could we maybe adjust the watermark step adaptively at runtime?
+	 * That is, when we get to the "sort" step, maybe check how many rows
+	 * are there, and if there are only few then try increasing the step?
+	 */
+	brinsort_plan->watermark_step = brinsort_watermark_step;
+
+	if (brinsort_plan->watermark_step == 0)
+	{
+		BrinMinmaxStats *amstats;
+
+		/**/
+		Cardinality		rows = brinsort_plan->scan.plan.plan_rows;
+
+		/* estimate rowsize in the tuplesort */
+		int				width = brinsort_plan->scan.plan.plan_width;
+		int				tupwidth = (MAXALIGN(width) + MAXALIGN(SizeofHeapTupleHeader));
+
+		/* Don't overflow work_mem (use only half to absorb variations. */
+		int				maxrows = (work_mem * 1024L / tupwidth / 2);
+
+		/* If this is a LIMIT query, aim only for the required number of rows. */
+		if (root->limit_tuples > 0)
+			maxrows = Min(maxrows, root->limit_tuples);
+
+		/* FIXME hard-coded attnum */
+		amstats = (BrinMinmaxStats *) get_attindexam(brinsort_plan->indexid, 1);
+
+		if (amstats)
+		{
+			double	rows_per_step = rows / amstats->minval_ndistinct;
+			elog(WARNING, "rows_per_step = %f", rows_per_step);
+
+			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
+
+			elog(WARNING, "calculated step = %d", brinsort_plan->watermark_step);
+		}
+
+		brinsort_plan->watermark_step = Max(brinsort_plan->watermark_step, 1);
+		brinsort_plan->watermark_step = Min(brinsort_plan->watermark_step, 1024);
+	}
+
 	return brinsort_plan;
 }
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c7abdade496..9ab51a22db7 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3432,7 +3432,7 @@ struct config_int ConfigureNamesInt[] =
 			NULL
 		},
 		&brinsort_watermark_step,
-		1, 1, INT_MAX,
+		0, 0, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index e8f7b25549f..86879bed2f4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1671,6 +1671,7 @@ typedef struct BrinSortState
 	BrinRangeScanDesc *bs_scan;
 	BrinRange	   *bs_range;
 	ExprState	   *bs_qual;
+	int				bs_watermark_step;
 	Datum			bs_watermark;
 	bool			bs_watermark_set;
 	BrinSortPhase	bs_phase;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4ef5362acc..9f9ad97ac2d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -519,6 +519,9 @@ typedef struct BrinSort
 	/* NULLS FIRST/LAST directions */
 	bool	   *nullsFirst pg_node_attr(array_size(numCols));
 
+	/* number of watermark steps to make */
+	int			watermark_step;
+
 } BrinSort;
 
 /* ----------------
-- 
2.37.3

0006-wip-adaptive-watermark-step-20221022.patchtext/x-patch; charset=UTF-8; name=0006-wip-adaptive-watermark-step-20221022.patchDownload

From 3afa1148fe17a5c47267853e4cb3cac03bd595b0 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 01:39:39 +0200
Subject: [PATCH 6/6] wip: adaptive watermark step

Another option it to adjust the watermark step based on past tuplesort
executions, and either increase or decrease the step, based on whether
the sort was in-memory or on-disk, etc.

To do this, set the GUC to -1:

  SET brinsort_watermark_step = -1;
---
 src/backend/executor/nodeBrinSort.c     | 51 +++++++++++++++++++++++--
 src/backend/optimizer/plan/createplan.c |  7 +---
 src/backend/utils/misc/guc_tables.c     |  2 +-
 src/backend/utils/sort/tuplesort.c      | 12 ++++++
 src/include/nodes/execnodes.h           |  2 +-
 src/include/utils/tuplesort.h           |  1 +
 6 files changed, 64 insertions(+), 11 deletions(-)

diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 2f8e92753cd..2bf75fd603a 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -255,6 +255,8 @@ static TupleTableSlot *IndexNext(BrinSortState *node);
 static bool IndexRecheck(BrinSortState *node, TupleTableSlot *slot);
 static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 
+extern int brinsort_watermark_step;
+
 #define BRINSORT_DEBUG
 
 /* do various consistency checks */
@@ -814,6 +816,42 @@ brinsort_rescan(BrinSortState *node)
 	tuplesort_rescan(node->bs_scan->ranges);
 }
 
+/*
+ * Look at the tuplesort statistics, and maybe increase or decrease the
+ * watermark step. If the last sort was in-memory, we decrease the step.
+ * If the sort was in-memory, but we used less than work_mem/3, increment
+ * the step value.
+ *
+ * XXX This should probably behave differently for LIMIT queries, so that
+ * we don't load too many rows unnecessarily. We already consider that in
+ * create_brinsort_plan, but maybe we should limit increments to the ste
+ * value here too - say, by tracking how many rows are we supposed to
+ * produce, and limiting the watermark so that we don't process too many
+ * rows in future steps.
+ *
+ * XXX We might also track the number of rows in the sort and space used,
+ * to calculate more accurate estimate of row width. And then use that to
+ * calculate number of rows that fit into work_mem. But the number of rows
+ * that go into tuplesort (per range) added would still remain fairly
+ * inaccurate, so not sure how good this woud be.
+ */
+static void
+brinsort_adjust_watermark_step(BrinSortState *node, TuplesortInstrumentation *stats)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+
+	if (brinsort_watermark_step != -1)
+		return;
+
+	if (stats->spaceType == SORT_SPACE_TYPE_DISK)
+		plan->watermark_step--;
+	else if (stats->spaceUsed < work_mem / 3)
+		plan->watermark_step++;
+
+	plan->watermark_step = Max(1, plan->watermark_step);
+	plan->watermark_step = Min(1024, plan->watermark_step);
+}
+
 /* ----------------------------------------------------------------
  *		IndexNext
  *
@@ -948,15 +986,20 @@ IndexNext(BrinSortState *node)
 					 */
 					if (node->bs_tuplesortstate)
 					{
+						TuplesortInstrumentation stats;
+
+						tuplesort_reset_stats(node->bs_tuplesortstate);
+
 						tuplesort_performsort(node->bs_tuplesortstate);
 						node->bs_stats.sort_count++;
 
-#ifdef BRINSORT_DEBUG
-						{
-							TuplesortInstrumentation stats;
+						memset(&stats, 0, sizeof(TuplesortInstrumentation));
+						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
 
-							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+						brinsort_adjust_watermark_step(node, &stats);
 
+#ifdef BRINSORT_DEBUG
+						{
 							if (stats.spaceType == SORT_SPACE_TYPE_DISK)
 							{
 								node->bs_stats.sort_count_on_disk++;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 997c272dec0..dc0a3669df2 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -3375,7 +3375,7 @@ create_brinsort_plan(PlannerInfo *root,
 	 */
 	brinsort_plan->watermark_step = brinsort_watermark_step;
 
-	if (brinsort_plan->watermark_step == 0)
+	if (brinsort_plan->watermark_step <= 0)
 	{
 		BrinMinmaxStats *amstats;
 
@@ -3398,12 +3398,9 @@ create_brinsort_plan(PlannerInfo *root,
 
 		if (amstats)
 		{
-			double	rows_per_step = rows / amstats->minval_ndistinct;
-			elog(WARNING, "rows_per_step = %f", rows_per_step);
+			double	rows_per_step = Max(1.0, (rows / amstats->minval_ndistinct));
 
 			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
-
-			elog(WARNING, "calculated step = %d", brinsort_plan->watermark_step);
 		}
 
 		brinsort_plan->watermark_step = Max(brinsort_plan->watermark_step, 1);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 9ab51a22db7..b6d4186241f 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3432,7 +3432,7 @@ struct config_int ConfigureNamesInt[] =
 			NULL
 		},
 		&brinsort_watermark_step,
-		0, 0, INT_MAX,
+		0, -1, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 416f02ba3cb..c61f27b6fa2 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2574,6 +2574,18 @@ tuplesort_get_stats(Tuplesortstate *state,
 	}
 }
 
+/*
+ * tuplesort_reset_stats - reset summary statistics
+ *
+ * This can be called before tuplesort_performsort() starts.
+ */
+void
+tuplesort_reset_stats(Tuplesortstate *state)
+{
+	state->isMaxSpaceDisk = false;
+	state->maxSpace = 0;
+}
+
 /*
  * Convert TuplesortMethod to a string.
  */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 86879bed2f4..485b7b2eeb3 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1682,7 +1682,7 @@ typedef struct BrinSortState
 	 * We need two tuplesort instances - one for current range, one for
 	 * spill-over tuples from the overlapping ranges
 	 */
-	void		   *bs_tuplesortstate;
+	Tuplesortstate  *bs_tuplesortstate;
 	Tuplestorestate *bs_tuplestore;
 } BrinSortState;
 
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 44412749906..897dfeb274f 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -367,6 +367,7 @@ extern void tuplesort_reset(Tuplesortstate *state);
 
 extern void tuplesort_get_stats(Tuplesortstate *state,
 								TuplesortInstrumentation *stats);
+extern void tuplesort_reset_stats(Tuplesortstate *state);
 extern const char *tuplesort_method_name(TuplesortMethod m);
 extern const char *tuplesort_space_type_name(TuplesortSpaceType t);
 
-- 
2.37.3

#19

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Tomas Vondra (#1)

4 attachment(s)

Re: PATCH: Using BRIN indexes for sorted output

On Sat, Oct 15, 2022 at 02:33:50PM +0200, Tomas Vondra wrote:

Of course, if there are e.g. BTREE indexes this is going to be slower,
but people are unlikely to have both index types on the same column.

On Sun, Oct 16, 2022 at 05:48:31PM +0200, Tomas Vondra wrote:

I don't think it's all that unfair. How likely is it to have both a BRIN
and btree index on the same column? And even if you do have such indexes

Note that we (at my work) use unique, btree indexes on multiple columns
for INSERT ON CONFLICT into the most-recent tables: UNIQUE(a,b,c,...),
plus a separate set of indexes on all tables, used for searching:
BRIN(a) and BTREE(b). I'd hope that the costing is accurate enough to
prefer the btree index for searching the most-recent table, if that's
what's faster (for example, if columns b and c are specified).

+	/* There must not be any TID scan in progress yet. */
+	Assert(node->ss.ss_currentScanDesc == NULL);
+
+	/* Initialize the TID range scan, for the provided block range. */
+	if (node->ss.ss_currentScanDesc == NULL)
+	{

Why is this conditional on the condition that was just Assert()ed ?

+void
+cost_brinsort(BrinSortPath *path, PlannerInfo *root, double loop_count,
+		   bool partial_path)

It's be nice to refactor existing code to avoid this part being so
duplicitive.

+	 * In some situations (particularly with OR'd index conditions) we may
+	 * have scan_clauses that are not equal to, but are logically implied by,
+	 * the index quals; so we also try a predicate_implied_by() check to see

Isn't that somewhat expensive ?

If that's known, then it'd be good to say that in the documentation.

+	{
+		{"enable_brinsort", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of BRIN sort plans."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_brinsort,
+		false,

I think new GUCs should be enabled during patch development.
Maybe in a separate 0002 patch "for CI only not for commit".
That way "make check" at least has a chance to hit that new code paths.

Also, note that indxpath.c had the var initialized to true.

+			attno = (i + 1);
+       nranges = (nblocks / pagesPerRange);
+                               node->bs_phase = (nullsFirst) ? BRINSORT_LOAD_NULLS : BRINSORT_LOAD_RANGE;

I'm curious why you have parenthesis these places ?

+#ifndef NODEBrinSort_H
+#define NODEBrinSort_H

NODEBRIN_SORT would be more consistent with NODEINCREMENTALSORT.
But I'd prefer NODE_* - otherwise it looks like NO DEBRIN.

This needed a bunch of work needed to pass any of the regression tests -
even with the feature set to off.

. meson.build needs the same change as the corresponding ./Makefile.
. guc missing from postgresql.conf.sample
. brin_validate.c is missing support for the opr function.
I gather you're planning on changing this part (?) but this allows to
pass tests for now.
. mingw is warning about OidIsValid(pointer) in nodeBrinSort.c.
https://cirrus-ci.com/task/5771227447951360?logs=mingw_cross_warning#L969
. Uninitialized catalog attribute.
. Some typos in your other patches: "heuristics heuristics". ste.
lest (least).

--
Justin

Attachments:

0001-Allow-index-AMs-to-build-and-use-custom-statistics.patchtext/x-diff; charset=us-asciiDownload

From 1db91c1b55d6bc8016274ce880b799081021ab0a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 17 Oct 2022 18:39:28 +0200
Subject: [PATCH 1/4] Allow index AMs to build and use custom statistics

Some indexing AMs work very differently and estimating them using
existing statistics is problematic, producing unreliable costing. This
applies e.g. to BRIN, which relies on page ranges, not tuple pointers.

This adds an optional AM procedure, allowing the opfamily to build
custom statistics, store them in pg_statistic and then use them during
planning. By default this is disabled, but may be enabled by setting

   SET enable_indexam_stats = true;

Then ANALYZE will call the optional procedure for all indexes.
---
 src/backend/access/brin/brin.c        |    1 +
 src/backend/access/brin/brin_minmax.c | 1332 +++++++++++++++++++++++++
 src/backend/commands/analyze.c        |  138 ++-
 src/backend/utils/adt/selfuncs.c      |   59 ++
 src/backend/utils/cache/lsyscache.c   |   41 +
 src/backend/utils/misc/guc_tables.c   |   10 +
 src/include/access/amapi.h            |    2 +
 src/include/access/brin.h             |   51 +
 src/include/access/brin_internal.h    |    1 +
 src/include/catalog/pg_amproc.dat     |   64 ++
 src/include/catalog/pg_proc.dat       |    4 +
 src/include/catalog/pg_statistic.h    |    5 +
 src/include/commands/vacuum.h         |    2 +
 src/include/utils/lsyscache.h         |    1 +
 14 files changed, 1706 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 20b7d65b948..d2c30336981 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -95,6 +95,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->amstrategies = 0;
 	amroutine->amsupport = BRIN_LAST_OPTIONAL_PROCNUM;
 	amroutine->amoptsprocnum = BRIN_PROCNUM_OPTIONS;
+	amroutine->amstatsprocnum = BRIN_PROCNUM_STATISTICS;
 	amroutine->amcanorder = false;
 	amroutine->amcanorderbyop = false;
 	amroutine->amcanbackward = false;
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 9e8a8e056cc..e4c9e56c623 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -10,17 +10,22 @@
  */
 #include "postgres.h"
 
+#include "access/brin.h"
 #include "access/brin_internal.h"
+#include "access/brin_revmap.h"
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 typedef struct MinmaxOpaque
 {
@@ -31,6 +36,11 @@ typedef struct MinmaxOpaque
 static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
+/* print debugging into about calculated statistics */
+#define STATS_DEBUG
+
+/* calculate the stats in different ways for cross-checking */
+#define STATS_CROSS_CHECK
 
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
@@ -253,6 +263,1328 @@ brin_minmax_union(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/* FIXME copy of a private struct from brin.c */
+typedef struct BrinOpaque
+{
+	BlockNumber bo_pagesPerRange;
+	BrinRevmap *bo_rmAccess;
+	BrinDesc   *bo_bdesc;
+} BrinOpaque;
+
+/*
+ * Compare ranges by minval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_minval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->min_value, rb->min_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/*
+ * Compare ranges by maxval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_maxval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->max_value, rb->max_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/* compare values using an operator from typcache */
+static int
+range_values_cmp(const void *a, const void *b, void *arg)
+{
+	Datum	da = * (Datum *) a;
+	Datum	db = * (Datum *) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  da, db);
+	return DatumGetInt32(c);
+}
+
+/*
+ * maxval_start
+ *		Determine first index so that (maxvalue >= value).
+ *
+ * The array of ranges is expected to be sorted by maxvalue, so this is the first
+ * range that can possibly intersect with range having "value" as minval.
+ */
+static int
+maxval_start(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[start]->max_value, typcache) <= 0)
+		return 0;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[end]->max_value, typcache) > 0)
+		return nranges;
+
+	while ((end - start) > 0)
+	{
+		int	midpoint;
+		int	r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->max_value, typcache);
+
+		if (r <= 0)
+			end = midpoint;
+		else
+			start = (midpoint + 1);
+	}
+
+	Assert(ranges[start]->max_value >= value);
+	Assert(ranges[start-1]->max_value < value);
+
+	return start;
+}
+
+/*
+ * minval_end
+ *		Determine first index so that (minval > value).
+ *
+ * The array of ranges is expected to be sorted by minvalue, so this is the first
+ * range that can't possibly intersect with a range having "value" as maxval.
+ */
+static int
+minval_end(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[end]->min_value, typcache) >= 0)
+		return nranges;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[start]->min_value, typcache) < 0)
+		return 0;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->min_value, typcache);
+
+		if (r >= 0)
+			start = midpoint + 1;
+		else
+			end = midpoint;
+	}
+
+	Assert(ranges[start]->min_value > value);
+	Assert(ranges[start-1]->min_value <= value);
+
+	return start;
+}
+
+
+/*
+ * lower_bound
+ *		Determine first index so that (values[index] >= value).
+ *
+ * The array of ranges is expected to be sorted by maxvalue, so this is the first
+ * range that can possibly intersect with range having "value" as minval.
+ */
+static int
+lower_bound(Datum *values, int nvalues, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &values[start], typcache) <= 0)
+		return 0;
+
+	// no matches
+	if (range_values_cmp(&value, &values[end], typcache) > 0)
+		return nvalues;
+
+	while ((end - start) > 0)
+	{
+		int	midpoint;
+		int	r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &values[midpoint], typcache);
+
+		if (r <= 0)
+			end = midpoint;
+		else
+			start = (midpoint + 1);
+	}
+
+	Assert(values[start] >= value);
+	Assert(values[start-1] < value);
+
+	return start;
+}
+
+/*
+ * upper_bound
+ *		Determine first index so that (values[index] > value).
+ *
+ * The array of ranges is expected to be sorted by minvalue, so this is the first
+ * range that can't possibly intersect with a range having "value" as maxval.
+ */
+static int
+upper_bound(Datum *values, int nvalues, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &values[end], typcache) >= 0)
+		return nvalues;
+
+	// no matches
+	if (range_values_cmp(&value, &values[start], typcache) < 0)
+		return 0;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &values[midpoint], typcache);
+
+		if (r >= 0)
+			start = midpoint + 1;
+		else
+			end = midpoint;
+	}
+
+	Assert(values[start] > value);
+	Assert(values[start-1] <= value);
+
+	return start;
+}
+
+/*
+ * Simple histogram, with bins tracking value and two overlap counts.
+ *
+ * XXX Maybe we should have two separate histograms, one for all counts and
+ * another one for "unique" values.
+ *
+ * XXX Serialize the histogram. There might be a data set where we have very
+ * many distinct buckets (values having very different number of matching
+ * ranges) - not sure if there's some sort of upper limit (but hard to say for
+ * other opclasses, like bloom). And we don't want arbitrarily large histogram,
+ * to keep the statistics fairly small, I guess. So we'd need to pick a subset,
+ * merge buckets with "similar" counts, or approximate it somehow. For now we
+ * don't serialize it, because we don't use the histogram.
+ */
+typedef struct histogram_bin_t
+{
+	int		value;
+	int		count;
+} histogram_bin_t;
+
+typedef struct histogram_t
+{
+	int				nbins;
+	int				nbins_max;
+	histogram_bin_t	bins[FLEXIBLE_ARRAY_MEMBER];
+} histogram_t;
+
+#define HISTOGRAM_BINS_START 32
+
+/* allocate histogram with default number of bins */
+static histogram_t *
+histogram_init(void)
+{
+	histogram_t *hist;
+
+	hist = (histogram_t *) palloc0(offsetof(histogram_t, bins) +
+								   sizeof(histogram_bin_t) * HISTOGRAM_BINS_START);
+	hist->nbins_max = HISTOGRAM_BINS_START;
+
+	return hist;
+}
+
+/*
+ * histogram_add
+ *		Add a hit for a particular value to the histogram.
+ *
+ * XXX We don't sort the bins, so just do binary sort. For large number of values
+ * this might be an issue, for small number of values a linear search is fine.
+ */
+static histogram_t *
+histogram_add(histogram_t *hist, int value)
+{
+	bool	found = false;
+	histogram_bin_t *bin;
+
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		if (hist->bins[i].value == value)
+		{
+			bin = &hist->bins[i];
+			found = true;
+		}
+	}
+
+	if (!found)
+	{
+		if (hist->nbins == hist->nbins_max)
+		{
+			int		nbins = (2 * hist->nbins_max);
+			hist = repalloc(hist, offsetof(histogram_t, bins) +
+								   sizeof(histogram_bin_t) * nbins);
+			hist->nbins_max = nbins;
+		}
+
+		Assert(hist->nbins < hist->nbins_max);
+
+		bin = &hist->bins[hist->nbins++];
+		bin->value = value;
+		bin->count = 0;
+	}
+
+	bin->count += 1;
+
+	Assert(bin->value == value);
+	Assert(bin->count >= 0);
+
+	return hist;
+}
+
+/* used to sort histogram bins by value */
+static int
+histogram_bin_cmp(const void *a, const void *b)
+{
+	histogram_bin_t *ba = (histogram_bin_t *) a;
+	histogram_bin_t *bb = (histogram_bin_t *) b;
+
+	if (ba->value < bb->value)
+		return -1;
+
+	if (bb->value < ba->value)
+		return 1;
+
+	return 0;
+}
+
+static void
+histogram_print(histogram_t *hist)
+{
+	return;
+
+	elog(WARNING, "----- histogram -----");
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		elog(WARNING, "bin %d value %d count %d",
+			 i, hist->bins[i].value, hist->bins[i].count);
+	}
+}
+
+/*
+ * brin_minmax_count_overlaps
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges to quickly eliminate ranges that can't possibly
+ * intersect. We simply walk minranges until minval > current maxval, and
+ * we're done.
+ *
+ * Unlike brin_minmax_count_overlaps2, this does not have issues with wide
+ * ranges, so this is what we should use.
+ */
+static int
+brin_minmax_count_overlaps(BrinRange **minranges, int nranges, TypeCacheEntry *typcache)
+{
+	int noverlaps;
+
+#ifdef STATS_DEBUG
+	TimestampTz		start_ts = GetCurrentTimestamp();
+#endif
+
+	noverlaps = 0;
+	for (int i = 0; i < nranges; i++)
+	{
+		Datum	maxval = minranges[i]->max_value;
+
+		/*
+		 * Determine index of the first range with (minval > current maxval)
+		 * by binary search. We know all other ranges can't overlap the
+		 * current one. We simply subtract indexes to count ranges.
+		 */
+		int		idx = minval_end(minranges, nranges, maxval, typcache);
+
+		/* -1 because we don't count the range as intersecting with itself */
+		noverlaps += (idx - i - 1);
+	}
+
+	/*
+	 * We only count 1/2 the ranges (minval > current minval), so the total
+	 * number of overlaps is twice what we counted.
+	 */
+	noverlaps *= 2;
+
+#ifdef STATS_DEBUG
+	elog(WARNING, "----- brin_minmax_count_overlaps -----");
+	elog(WARNING, "noverlaps = %d", noverlaps);
+	elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+#endif
+
+	return noverlaps;
+}
+
+#ifdef STATS_CROSS_CHECK
+/*
+ * brin_minmax_count_overlaps2
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges/maxranges to quickly eliminate ranges that can't
+ * possibly intersect.
+ *
+ * XXX Seems rather complicated and works poorly for wide ranges (with outlier
+ * values), brin_minmax_count_overlaps is likely better.
+ */
+static int
+brin_minmax_count_overlaps2(BrinRanges *ranges,
+						   BrinRange **minranges, BrinRange **maxranges,
+						   TypeCacheEntry *typcache)
+{
+	int noverlaps;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Walk the ranges ordered by max_values, see how many ranges overlap.
+	 *
+	 * Once we get to a state where (min_value > current.max_value) for
+	 * all future ranges, we know none of them can overlap and we can
+	 * terminate. This is what min_index_lowest is for.
+	 *
+	 * XXX If there are very wide ranges (with outlier min/max values),
+	 * the min_index_lowest is going to be pretty useless, because the
+	 * range will be sorted at the very end by max_value, but will have
+	 * very low min_index, so this won't work.
+	 *
+	 * XXX We could collect a more elaborate stuff, like for example a
+	 * histogram of number of overlaps, or maximum number of overlaps.
+	 * So we'd have average, but then also an info if there are some
+	 * ranges with very many overlaps.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int			idx = i+1;
+		BrinRange *ra = maxranges[i];
+		uint64		min_index = ra->min_index;
+
+		CHECK_FOR_INTERRUPTS();
+
+#ifdef NOT_USED
+		/*
+		 * XXX Not needed, we can just count "future" ranges and then
+		 * we just multiply by 2.
+		 */
+
+		/*
+		 * What's the first range that might overlap with this one?
+		 * needs to have maxval > current.minval.
+		 */
+		while (idx > 0)
+		{
+			BrinRange *rb = maxranges[idx - 1];
+
+			/* the range is before the current one, so can't intersect */
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				break;
+
+			idx--;
+		}
+#endif
+
+		/*
+		 * Find the first min_index that is higher than the max_value,
+		 * so that we can compare that instead of the values in the
+		 * next loop. There should be fewer value comparisons than in
+		 * the next loop, so we'll save on function calls.
+		 */
+		while (min_index < ranges->nranges)
+		{
+			if (range_values_cmp(&minranges[min_index]->min_value,
+								 &ra->max_value, typcache) > 0)
+				break;
+
+			min_index++;
+		}
+
+		/*
+		 * Walk the following ranges (ordered by max_value), and check
+		 * if it overlaps. If it matches, we look at the next one. If
+		 * not, we check if there can be more ranges.
+		 */
+		for (int j = idx; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = maxranges[j];
+
+			/* the range overlaps - just continue with the next one */
+			// if (range_values_cmp(&rb->min_value, &ra->max_value, typcache) <= 0)
+			if (rb->min_index < min_index)
+			{
+				noverlaps++;
+				continue;
+			}
+
+			/*
+			 * Are there any future ranges that might overlap? We can
+			 * check the min_index_lowest to decide quickly.
+			 */
+			 if (rb->min_index_lowest >= min_index)
+					break;
+		}
+	}
+
+	/*
+	 * We only count intersect for "following" ranges when ordered by maxval,
+	 * so we only see 1/2 the overlaps. So double the result.
+	 */
+	noverlaps *= 2;
+
+	elog(WARNING, "----- brin_minmax_count_overlaps2 -----");
+	elog(WARNING, "noverlaps = %d", noverlaps);
+	elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+
+	return noverlaps;
+}
+
+/*
+ * brin_minmax_count_overlaps_bruteforce
+ *		Calculate number of overlaps by brute force.
+ *
+ * Actually compares every range to every other range. Quite expensive, used
+ * primarily to cross-check the other algorithms. 
+ */
+static int
+brin_minmax_count_overlaps_bruteforce(BrinRanges *ranges, TypeCacheEntry *typcache)
+{
+	int noverlaps;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Brute force calculation of overlapping ranges, comparing each
+	 * range to every other range - bound to be pretty expensive, as
+	 * it's pretty much O(N^2). Kept mostly for easy cross-check with
+	 * the preceding "optimized" code.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		BrinRange *ra = &ranges->ranges[i];
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = &ranges->ranges[j];
+
+			CHECK_FOR_INTERRUPTS();
+
+			if (i == j)
+				continue;
+
+			if (range_values_cmp(&ra->max_value, &rb->min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				continue;
+
+			elog(DEBUG1, "[%ld,%ld] overlaps [%ld,%ld]",
+				 ra->min_value, ra->max_value,
+				 rb->min_value, rb->max_value);
+
+			noverlaps++;
+		}
+	}
+
+	elog(WARNING, "----- brin_minmax_count_overlaps_bruteforce -----");
+	elog(WARNING, "noverlaps = %d", noverlaps);
+	elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+
+	return noverlaps;
+}
+#endif
+
+/*
+ * brin_minmax_match_tuples_to_ranges
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Alternative to brin_minmax_match_tuples_to_ranges2, leveraging ordering
+ * of values, not ranges.
+ *
+ * XXX This seems like the optimal way to do this.
+ */
+static void
+brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
+								   int numrows, HeapTuple *rows,
+								   int nvalues, Datum *values,
+								   TypeCacheEntry *typcache,
+								   int *res_nmatches,
+								   int *res_nmatches_unique,
+								   int *res_nvalues_unique)
+{
+	int		nmatches = 0;
+	int		nmatches_unique = 0;
+	int		nvalues_unique = 0;
+	int		nmatches_value = 0;
+
+	int	   *unique = (int *) palloc0(sizeof(int) * nvalues);
+
+#ifdef STATS_DEBUG
+	TimestampTz		start_ts = GetCurrentTimestamp();
+#endif
+
+	/*
+	 * Build running count of unique values. We know there are unique[i]
+	 * unique values in values array up to index "i".
+	 */
+	unique[0] = 1;
+	for (int i = 1; i < nvalues; i++)
+	{
+		if (range_values_cmp(&values[i-1], &values[i], typcache) == 0)
+			unique[i] = unique[i-1];
+		else
+			unique[i] = unique[i-1] + 1;
+	}
+
+	nvalues_unique = unique[nvalues-1];
+
+	/*
+	 * Walk the ranges, for each range determine the first/last mapping
+	 * value. Use the "unique" array to count the unique values.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int		start;
+		int		end;
+
+		CHECK_FOR_INTERRUPTS();
+
+		start = lower_bound(values, nvalues, ranges->ranges[i].min_value, typcache);
+		end = upper_bound(values, nvalues, ranges->ranges[i].max_value, typcache);
+
+		Assert(end > start);
+
+		nmatches_value = (end - start);
+		nmatches_unique += (unique[end-1] - unique[start] + 1);
+
+		nmatches += nmatches_value;
+	}
+
+#ifdef STATS_DEBUG
+	elog(WARNING, "----- brin_minmax_match_tuples_to_ranges -----");
+	elog(WARNING, "nmatches = %d %f", nmatches, (double) nmatches / numrows);
+	elog(WARNING, "nmatches unique = %d %d %f", nmatches_unique, nvalues_unique,
+		 (double) nmatches_unique / nvalues_unique);
+	elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+#endif
+
+	*res_nmatches = nmatches;
+	*res_nmatches_unique = nmatches_unique;
+	*res_nvalues_unique = nvalues_unique;
+}
+
+#ifdef STATS_CROSS_CHECK
+/*
+ * brin_minmax_match_tuples_to_ranges2
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Match sample tuples to the ranges, so that we can count how many ranges
+ * a value matches on average. This might seem redundant to the number of
+ * overlaps, because the value is ~avg_overlaps/2.
+ *
+ * Imagine ranges arranged in "shifted" uniformly by 1/overlaps, e.g. with 3
+ * overlaps [0,100], [33,133], [66, 166] and so on. A random value will hit
+ * only half of there ranges, thus 1/2. This can be extended to randomly
+ * overlapping ranges.
+ *
+ * However, we may not be able to count overlaps for some opclasses (e.g. for
+ * bloom ranges), in which case we have at least this.
+ *
+ * This simply walks the values, and determines matching ranges by looking
+ * for lower/upper bound in ranges ordered by minval/maxval.
+ *
+ * XXX The other question is what to do about duplicate values. If we have a
+ * very frequent value in the sample, it's likely in many places/ranges. Which
+ * will skew the average, because it'll be added repeatedly. So we also count
+ * avg_ranges for unique values.
+ *
+ * XXX The relationship that (average_matches ~ average_overlaps/2) only
+ * works for minmax opclass, and can't be extended to minmax-multi. The
+ * overlaps can only consider the two extreme values (essentially treating
+ * the summary as a single minmax range), because that's what brinsort
+ * needs. But the minmax-multi range may have "gaps" (kinda the whole point
+ * of these opclasses), which affects matching tuples to ranges.
+ *
+ * XXX This also builds histograms of the number of matches, both for the
+ * raw and unique values. At the moment we don't do anything with the
+ * results, though (except for printing those).
+ */
+static void
+brin_minmax_match_tuples_to_ranges2(BrinRanges *ranges,
+								    BrinRange **minranges, BrinRange **maxranges,
+								    int numrows, HeapTuple *rows,
+								    int nvalues, Datum *values,
+								    TypeCacheEntry *typcache,
+								    int *res_nmatches,
+								    int *res_nmatches_unique,
+								    int *res_nvalues_unique)
+{
+	int		nmatches = 0;
+	int		nmatches_unique = 0;
+	int		nvalues_unique = 0;
+	histogram_t *hist = histogram_init();
+	histogram_t *hist_unique = histogram_init();
+	int		nmatches_value = 0;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		int		start;
+		int		end;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Same value as preceding, so just use the preceding count.
+		 * We don't increment the unique counters, because this is
+		 * a duplicate.
+		 */
+		if ((i > 0) && (range_values_cmp(&values[i-1], &values[i], typcache) == 0))
+		{
+			nmatches += nmatches_value;
+			hist = histogram_add(hist, nmatches_value);
+			continue;
+		}
+
+		nmatches_value = 0;
+
+		start = maxval_start(maxranges, ranges->nranges, values[i], typcache);
+		end = minval_end(minranges, ranges->nranges, values[i], typcache);
+
+		for (int j = start; j < ranges->nranges; j++)
+		{
+			if (maxranges[j]->min_index >= end)
+				continue;
+
+			if (maxranges[j]->min_index_lowest >= end)
+				break;
+
+			nmatches_value++;
+		}
+
+		hist = histogram_add(hist, nmatches_value);
+		hist_unique = histogram_add(hist_unique, nmatches_value);
+
+		nmatches += nmatches_value;
+		nmatches_unique += nmatches_value;
+		nvalues_unique++;
+	}
+
+	elog(WARNING, "----- brin_minmax_match_tuples_to_ranges2 -----");
+	elog(WARNING, "nmatches = %d %f", nmatches, (double) nmatches / numrows);
+	elog(WARNING, "nmatches unique = %d %d %f",
+		 nmatches_unique, nvalues_unique, (double) nmatches_unique / nvalues_unique);
+	elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+
+	pg_qsort(hist->bins, hist->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+	pg_qsort(hist_unique->bins, hist_unique->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+
+	histogram_print(hist);
+	histogram_print(hist_unique);
+
+	pfree(hist);
+	pfree(hist_unique);
+
+	*res_nmatches = nmatches;
+	*res_nmatches_unique = nmatches_unique;
+	*res_nvalues_unique = nvalues_unique;
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges_bruteforce
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Bruteforce approach, used mostly for cross-checking.
+ */
+static void
+brin_minmax_match_tuples_to_ranges_bruteforce(BrinRanges *ranges,
+											  int numrows, HeapTuple *rows,
+											  int nvalues, Datum *values,
+											  TypeCacheEntry *typcache,
+											  int *res_nmatches,
+											  int *res_nmatches_unique,
+											  int *res_nvalues_unique)
+{
+	int nmatches = 0;
+	int nmatches_unique = 0;
+	int nvalues_unique = 0;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		bool	is_unique;
+		int		nmatches_value = 0;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* is this a new value? */
+		is_unique = ((i == 0) || (range_values_cmp(&values[i-1], &values[i], typcache) != 0));
+
+		/* count unique values */
+		nvalues_unique += (is_unique) ? 1 : 0;
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			if (range_values_cmp(&values[i], &ranges->ranges[j].min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&values[i], &ranges->ranges[j].max_value, typcache) > 0)
+				continue;
+
+			nmatches_value++;
+		}
+
+		nmatches += nmatches_value;
+		nmatches_unique += (is_unique) ? nmatches_value : 0;
+	}
+
+	elog(WARNING, "----- brin_minmax_match_tuples_to_ranges_bruteforce -----");
+	elog(WARNING, "nmatches = %d %f", nmatches, (double) nmatches / numrows);
+	elog(WARNING, "nmatches unique = %d %d %f", nmatches_unique, nvalues_unique,
+		 (double) nmatches_unique / nvalues_unique);
+	elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+
+	*res_nmatches = nmatches;
+	*res_nmatches_unique = nmatches_unique;
+	*res_nvalues_unique = nvalues_unique;
+}
+#endif
+
+/*
+ * brin_minmax_value_stats
+ *		Calculate statistics about minval/maxval values.
+ *
+ * We calculate the number of distinct values, and also correlation with respect
+ * to blkno_start. We don't calculate the regular correlation coefficient, because
+ * our goal is to estimate how sequential the accesses are. The regular correlation
+ * would produce 0 for cyclical data sets like mod(i,1000000), but it may be quite
+ * sequantial access. Maybe it should be called differently, not correlation?
+ *
+ * XXX Maybe this should calculate minval vs. maxval correlation too?
+ *
+ * XXX I don't know how important the sequentiality is - BRIN generally uses 1MB
+ * page ranges, which is pretty sequential and the one random seek in between is
+ * likely going to be negligible. Maybe for small page ranges it'll matter, though.
+ */
+static void
+brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
+						int nranges, TypeCacheEntry *typcache,
+						double *minval_correlation, int64 *minval_ndistinct,
+						double *maxval_correlation, int64 *maxval_ndistinct)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1,
+			minval_corr = 0,
+			maxval_corr = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+			minval_ndist++;
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+			maxval_ndist++;
+
+		/* is it immediately sequential? */
+		if (minranges[i-1]->blkno_end + 1 == minranges[i]->blkno_start)
+			minval_corr++;
+
+		/* is it immediately sequential? */
+		if (maxranges[i-1]->blkno_end + 1 == maxranges[i]->blkno_start)
+			maxval_corr++;
+	}
+
+	*minval_ndistinct = minval_ndist;
+	*maxval_ndistinct = maxval_ndist;
+
+	*minval_correlation = (double) minval_corr / nranges;
+	*maxval_correlation = (double) maxval_corr / nranges;
+
+#ifdef STATS_DEBUG
+	elog(WARNING, "----- brin_minmax_value_stats -----");
+	elog(WARNING, "minval ndistinct %ld correlation %f",
+		 *minval_ndistinct, *minval_correlation);
+
+	elog(WARNING, "maxval ndistinct %ld correlation %f",
+		 *maxval_ndistinct, *maxval_correlation);
+#endif
+}
+
+/*
+ * brin_minmax_stats
+ *		Calculate custom statistics for a BRIN minmax index.
+ *
+ * At the moment this calculates:
+ *
+ *  - number of summarized/not-summarized and all/has nulls ranges
+ *  - average number of overlaps for a range
+ *  - average number of rows matching a range
+ *  - number of distinct minval/maxval values
+ *
+ * There are multiple ways to calculate some of the metrics, so to allow
+ * cross-checking during development it's possible to run and compare all.
+ * To do that, define STATS_CROSS_CHECK. There's also STATS_DEBUG define
+ * that simply prints the calculated results.
+ *
+ * XXX This could also calculate correlation of the range minval, so that
+ * we can estimate how much random I/O will happen during the BrinSort.
+ * And perhaps we should also sort the ranges by (minval,block_start) to
+ * make this as sequential as possible?
+ *
+ * XXX Another interesting statistics might be the number of ranges with
+ * the same minval (or number of distinct minval values), because that's
+ * essentially what we need to estimate how many ranges will be read in
+ * one brinsort step. In fact, knowing the number of distinct minval
+ * values tells us the number of BrinSort loops.
+ *
+ * XXX We might also calculate a histogram of minval/maxval values.
+ *
+ * XXX I wonder if we could track for each range track probabilities:
+ *
+ * - P1 = P(v <= minval)
+ * - P2 = P(x <= Max(maxval)) for Max(maxval) over preceding ranges
+ *
+ * That would allow us to estimate how many ranges we'll have to read to produce
+ * a particular number of rows, because we need the first probability to exceed
+ * the requested number of rows (fraction of the table):
+ *
+ *     (limit rows / reltuples) <= P(v <= minval)
+ *
+ * and then the second probability would say how many rows we'll process (either
+ * sort or spill). And inversely for the DESC ordering.
+ *
+ * The difference between P1 for two ranges is how much we'd have to sort
+ * if we moved the watermark between the ranges (first minval to second one).
+ * The (P2 - P1) for the new watermark range measures the number of rows in
+ * the tuplestore. We'll need to aggregate this, though, we can't keep the
+ * whole data - probably average/median/max for the differences would be nice.
+ * Might be tricky for different watermark step values, though.
+ *
+ * This would also allow estimating how many rows will spill from each range,
+ * because we have an estimate how many rows match a range on average, and
+ * we can compare it to the difference between P1.
+ *
+ * One issue is we don't have actual tuples from the ranges, so we can't
+ * measure exactly how many rows would we add. But we can match the sample
+ * and at least estimate the the probability difference.
+ */
+Datum
+brin_minmax_stats(PG_FUNCTION_ARGS)
+{
+	Relation		heapRel = (Relation) PG_GETARG_POINTER(0);
+	Relation		indexRel = (Relation) PG_GETARG_POINTER(1);
+	AttrNumber		attnum = PG_GETARG_INT16(2);
+	AttrNumber		heap_attnum = PG_GETARG_INT16(3);
+	HeapTuple	   *rows = (HeapTuple *) PG_GETARG_POINTER(4);
+	int				numrows = PG_GETARG_INT32(5);
+
+	BrinOpaque *opaque;
+	BlockNumber nblocks;
+	BlockNumber	nranges;
+	BlockNumber	heapBlk;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BrinRanges  *ranges;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinMinmaxStats *stats;
+
+	Oid				typoid;
+	TypeCacheEntry *typcache;
+	BrinRange	  **minranges,
+				  **maxranges;
+	int64			noverlaps;
+	int64			prev_min_index;
+
+	/*
+	 * Mostly what brinbeginscan does to initialize BrinOpaque, except that
+	 * we use active snapshot instead of the scan snapshot.
+	 */
+	opaque = palloc_object(BrinOpaque);
+	opaque->bo_rmAccess = brinRevmapInitialize(indexRel,
+											   &opaque->bo_pagesPerRange,
+											   GetActiveSnapshot());
+	opaque->bo_bdesc = brin_build_desc(indexRel);
+
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+
+	/*
+	 * How many ranges can there be? We simply look at the number of pages,
+	 * divide it by the pages_per_range.
+	 *
+	 * XXX We need to be careful not to overflow nranges, so we just divide
+	 * and then maybe add 1 for partial ranges.
+	 */
+	nranges = (nblocks / pagesPerRange);
+	if (nblocks % pagesPerRange != 0)
+		nranges += 1;
+
+	/* allocate for space, and also for the alternative ordering */
+	ranges = palloc0(offsetof(BrinRanges, ranges) + nranges * sizeof(BrinRange));
+	ranges->nranges = 0;
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* result stats */
+	stats = palloc0(sizeof(BrinMinmaxStats));
+	SET_VARSIZE(stats, sizeof(BrinMinmaxStats));
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX We count the ranges, and count the special types (not summarized,
+	 * all-null and has-null). The regular ranges are accumulated into an
+	 * array, so that we can calculate additional statistics (overlaps, hits
+	 * for sample tuples, etc).
+	 *
+	 * XXX This needs rethinking to make it work with large indexes with more
+	 * ranges than we can fit into memory (work_mem/maintenance_work_mem).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		stats->n_ranges++;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   GetActiveSnapshot());
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/* Ranges with no indexed tuple are ignored for overlap analysis. */
+		if (!gottuple)
+		{
+			continue;
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/* Placeholders can be ignored too, as if not summarized. */
+				continue;
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				/* OK this range is summarized */
+				stats->n_summarized++;
+
+				if (bval->bv_allnulls)
+					stats->n_all_nulls++;
+
+				if (bval->bv_hasnulls)
+					stats->n_has_nulls++;
+
+				if (!bval->bv_allnulls)
+				{
+					BrinRange  *range;
+
+					range = &ranges->ranges[ranges->nranges++];
+
+					range->blkno_start = heapBlk;
+					range->blkno_end = heapBlk + (pagesPerRange - 1);
+
+					range->min_value = bval->bv_values[0];
+					range->max_value = bval->bv_values[1];
+				}
+			}
+		}
+	}
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	elog(WARNING, "extracted ranges %d from BRIN index", ranges->nranges);
+
+	/* if we have no regular ranges, we're done */
+	if (ranges->nranges == 0)
+		goto cleanup;
+
+	/*
+	 * Build auxiliary info to optimize the calculation.
+	 *
+	 * We have ranges in the blocknum order, but that is not very useful when
+	 * calculating which ranges interstect - we could cross-check every range
+	 * against every other range, but that's O(N^2) and thus may get extremely
+	 * expensive pretty quick).
+	 *
+	 * To make that cheaper, we'll build two orderings, allowing us to quickly
+	 * eliminate ranges that can't possibly overlap:
+	 *
+	 * - minranges = ranges ordered by min_value
+	 * - maxranges = ranges ordered by max_value
+	 *
+	 * To count intersections, we'll then walk maxranges (i.e. ranges ordered
+	 * by maxval), and for each following range we'll check if it overlaps.
+	 * If yes, we'll proceed to the next one, until we find a range that does
+	 * not overlap. But there might be a later page overlapping - but we can
+	 * use a min_index_lowest tracking the minimum min_index for "future"
+	 * ranges to quickly decide if there are such ranges. If there are none,
+	 * we can terminate (and proceed to the next maxranges element), else we
+	 * have to process additional ranges.
+	 *
+	 * Note: This only counts overlaps with ranges with max_value higher than
+	 * the current one - we want to count all, but the overlaps with preceding
+	 * ranges have already been counted when processing those preceding ranges.
+	 * That is, we'll end up with counting each overlap just for one of those
+	 * ranges, so we get only 1/2 the count.
+	 *
+	 * Note: We don't count the range as overlapping with itself. This needs
+	 * to be considered later, when applying the statistics.
+	 *
+	 *
+	 * XXX This will not work for very many ranges - we can have up to 2^32 of
+	 * them, so allocating a ~32B struct for each would need a lot of memory.
+	 * Not sure what to do about that, perhaps we could sample a couple ranges
+	 * and do some calculations based on that? That is, we could process all
+	 * ranges up to some number (say, statistics_target * 300, as for rows), and
+	 * then sample ranges for larger tables. Then sort the sampled ranges, and
+	 * walk through all ranges once, comparing them to the sample and counting
+	 * overlaps (having them sorted should allow making this quite efficient,
+	 * I think - following algorithm similar to the one implemented here).
+	 */
+
+	/* info about ordering for the data type */
+	typoid = get_atttype(RelationGetRelid(indexRel), attnum);
+	typcache = lookup_type_cache(typoid, TYPECACHE_CMP_PROC_FINFO);
+
+	/* shouldn't happen, I think - we use this to build the index */
+	Assert(OidIsValid(typcache->cmp_proc_finfo.fn_oid));
+
+	minranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+	maxranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+
+	/*
+	 * Build and sort the ranges min_value / max_value (just pointers
+	 * to the main array). Then go and assign the min_index to each
+	 * range, and finally walk the maxranges array backwards and track
+	 * the min_index_lowest as minimum of "future" indexes.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		minranges[i] = &ranges->ranges[i];
+		maxranges[i] = &ranges->ranges[i];
+	}
+
+	qsort_arg(minranges, ranges->nranges, sizeof(BrinRange *),
+			  range_minval_cmp, typcache);
+
+	qsort_arg(maxranges, ranges->nranges, sizeof(BrinRange *),
+			  range_maxval_cmp, typcache);
+
+	/*
+	 * Update the min_index for each range. If the values are equal, be sure to
+	 * pick the lowest index with that min_value.
+	 */
+	minranges[0]->min_index = 0;
+	for (int i = 1; i < ranges->nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i]->min_value, &minranges[i-1]->min_value, typcache) == 0)
+			minranges[i]->min_index = minranges[i-1]->min_index;
+		else
+			minranges[i]->min_index = i;
+	}
+
+	/*
+	 * Walk the maxranges backward and assign the min_index_lowest as
+	 * a running minimum.
+	 */
+	prev_min_index = ranges->nranges;
+	for (int i = (ranges->nranges - 1); i >= 0; i--)
+	{
+		maxranges[i]->min_index_lowest = Min(maxranges[i]->min_index,
+											 prev_min_index);
+		prev_min_index = maxranges[i]->min_index_lowest;
+	}
+
+	/* calculate average number of overlapping ranges for any range */
+	noverlaps = brin_minmax_count_overlaps(minranges, ranges->nranges, typcache);
+
+	stats->avg_overlaps = (double) noverlaps / ranges->nranges;
+
+#ifdef STATS_CROSS_CHECK
+	brin_minmax_count_overlaps2(ranges, minranges, maxranges, typcache);
+	brin_minmax_count_overlaps_bruteforce(ranges, typcache);
+#endif
+
+	/* calculate minval/maxval stats (distinct values and correlation) */
+	brin_minmax_value_stats(minranges, maxranges,
+							ranges->nranges, typcache,
+							&stats->minval_correlation,
+							&stats->minval_ndistinct,
+							&stats->maxval_correlation,
+							&stats->maxval_ndistinct);
+
+	/* match tuples to ranges */
+	{
+		int		nvalues = 0;
+		int		nmatches,
+				nmatches_unique,
+				nvalues_unique;
+
+		Datum  *values = (Datum *) palloc0(numrows * sizeof(Datum));
+
+		TupleDesc	tdesc = RelationGetDescr(heapRel);
+
+		for (int i = 0; i < numrows; i++)
+		{
+			bool	isnull;
+			Datum	value;
+
+			value = heap_getattr(rows[i], heap_attnum, tdesc, &isnull);
+			if (!isnull)
+				values[nvalues++] = value;
+		}
+
+		qsort_arg(values, nvalues, sizeof(Datum), range_values_cmp, typcache);
+
+		/* optimized algorithm */
+		brin_minmax_match_tuples_to_ranges(ranges,
+										   numrows, rows, nvalues, values,
+										   typcache,
+										   &nmatches,
+										   &nmatches_unique,
+										   &nvalues_unique);
+
+		stats->avg_matches = (double) nmatches / numrows;
+		stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
+
+#ifdef STATS_CROSS_CHECK
+		brin_minmax_match_tuples_to_ranges2(ranges, minranges, maxranges,
+										    numrows, rows, nvalues, values,
+										    typcache,
+										    &nmatches,
+										    &nmatches_unique,
+										    &nvalues_unique);
+
+		brin_minmax_match_tuples_to_ranges_bruteforce(ranges,
+													  numrows, rows,
+													  nvalues, values,
+													  typcache,
+													  &nmatches,
+													  &nmatches_unique,
+													  &nvalues_unique);
+#endif
+	}
+
+	/*
+	 * Possibly quite large, so release explicitly and don't rely
+	 * on the memory context to discard this.
+	 */
+	pfree(minranges);
+	pfree(maxranges);
+
+cleanup:
+	/* possibly quite large, so release explicitly */
+	pfree(ranges);
+
+	/* free the BrinOpaque, just like brinendscan() would */
+	brinRevmapTerminate(opaque->bo_rmAccess);
+	brin_free_desc(opaque->bo_bdesc);
+
+	PG_RETURN_POINTER(stats);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index ff1354812bd..b7435194dc0 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -16,6 +16,7 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
 #include "access/detoast.h"
 #include "access/genam.h"
 #include "access/multixact.h"
@@ -30,6 +31,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_namespace.h"
@@ -81,6 +83,7 @@ typedef struct AnlIndexData
 
 /* Default statistics target (GUC parameter) */
 int			default_statistics_target = 100;
+bool		enable_indexam_stats = false;
 
 /* A few variables that don't seem worth passing around as parameters */
 static MemoryContext anl_context = NULL;
@@ -92,7 +95,7 @@ static void do_analyze_rel(Relation onerel,
 						   AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
 						   bool inh, bool in_outer_xact, int elevel);
 static void compute_index_stats(Relation onerel, double totalrows,
-								AnlIndexData *indexdata, int nindexes,
+								AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 								HeapTuple *rows, int numrows,
 								MemoryContext col_context);
 static VacAttrStats *examine_attribute(Relation onerel, int attnum,
@@ -454,15 +457,49 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		{
 			AnlIndexData *thisdata = &indexdata[ind];
 			IndexInfo  *indexInfo;
+			bool		collectAmStats;
+			Oid			regproc;
 
 			thisdata->indexInfo = indexInfo = BuildIndexInfo(Irel[ind]);
 			thisdata->tupleFract = 1.0; /* fix later if partial */
-			if (indexInfo->ii_Expressions != NIL && va_cols == NIL)
+
+			/*
+			 * Should we collect AM-specific statistics for any of the columns?
+			 *
+			 * If AM-specific statistics are enabled (using a GUC), see if we
+			 * have an optional support procedure to build the statistics.
+			 *
+			 * If there's any such attribute, we just force building stats
+			 * even for regular index keys (not just expressions) and indexes
+			 * without predicates. It'd be good to only build the AM stats, but
+			 * for now this is good enough.
+			 *
+			 * XXX The GUC is there morestly to make it easier to enable/disable
+			 * this during development.
+			 *
+			 * FIXME Only build the AM statistics, not the other stats. And only
+			 * do that for the keys with the optional procedure. not all of them.
+			 */
+			collectAmStats = false;
+			if (enable_indexam_stats && (Irel[ind]->rd_indam->amstatsprocnum != 0))
+			{
+				for (int j = 0; j < indexInfo->ii_NumIndexAttrs; j++)
+				{
+					regproc = index_getprocid(Irel[ind], (j+1), Irel[ind]->rd_indam->amstatsprocnum);
+					if (OidIsValid(regproc))
+					{
+						collectAmStats = true;
+						break;
+					}
+				}
+			}
+
+			if ((indexInfo->ii_Expressions != NIL || collectAmStats) && va_cols == NIL)
 			{
 				ListCell   *indexpr_item = list_head(indexInfo->ii_Expressions);
 
 				thisdata->vacattrstats = (VacAttrStats **)
-					palloc(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
+					palloc0(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
 				tcnt = 0;
 				for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
 				{
@@ -483,6 +520,12 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 						if (thisdata->vacattrstats[tcnt] != NULL)
 							tcnt++;
 					}
+					else
+					{
+						thisdata->vacattrstats[tcnt] =
+							examine_attribute(Irel[ind], i + 1, NULL);
+						tcnt++;
+					}
 				}
 				thisdata->attr_cnt = tcnt;
 			}
@@ -588,7 +631,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 		if (nindexes > 0)
 			compute_index_stats(onerel, totalrows,
-								indexdata, nindexes,
+								indexdata, Irel, nindexes,
 								rows, numrows,
 								col_context);
 
@@ -822,12 +865,82 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	anl_context = NULL;
 }
 
+/*
+ * compute_indexam_stats
+ *		Call the optional procedure to compute AM-specific statistics.
+ *
+ * We simply call the procedure, which is expected to produce a bytea value.
+ *
+ * At the moment this only deals with BRIN indexes, and bails out for other
+ * access methods, but it should be generic - use something like amoptsprocnum
+ * and just check if the procedure exists.
+ */
+static void
+compute_indexam_stats(Relation onerel,
+					  Relation indexRel, IndexInfo *indexInfo,
+					  double totalrows, AnlIndexData *indexdata,
+					  HeapTuple *rows, int numrows)
+{
+	if (!enable_indexam_stats)
+		return;
+
+	/* ignore index AMs without the optional procedure */
+	if (indexRel->rd_indam->amstatsprocnum == 0)
+		return;
+
+	/*
+	 * Look at attributes, and calculate stats for those that have the
+	 * optional stats proc for the opfamily.
+	 */
+	for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+	{
+		AttrNumber		attno = (i + 1);
+		AttrNumber		attnum = indexInfo->ii_IndexAttrNumbers[i];	/* heap attnum */
+		RegProcedure	regproc;
+		FmgrInfo	   *statsproc;
+		Datum			datum;
+		VacAttrStats   *stats;
+		MemoryContext	oldcxt;
+
+		/* do this first, as it doesn't fail when proc not defined */
+		regproc = index_getprocid(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+
+		/* ignore opclasses without the optional procedure */
+		if (!RegProcedureIsValid(regproc))
+			continue;
+
+		statsproc = index_getprocinfo(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+
+		stats = indexdata->vacattrstats[i];
+
+		if (statsproc != NULL)
+			elog(WARNING, "collecting stats on BRIN ranges %p using proc %p attnum %d",
+				 indexRel, statsproc, attno);
+
+		oldcxt = MemoryContextSwitchTo(stats->anl_context);
+
+		/* call the proc, let the AM calculate whatever it wants */
+		datum = FunctionCall6Coll(statsproc,
+								  InvalidOid, /* FIXME correct collation */
+								  PointerGetDatum(onerel),
+								  PointerGetDatum(indexRel),
+								  Int16GetDatum(attno),
+								  Int16GetDatum(attnum),
+								  PointerGetDatum(rows),
+								  Int32GetDatum(numrows));
+
+		stats->staindexam = datum;
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+}
+
 /*
  * Compute statistics about indexes of a relation
  */
 static void
 compute_index_stats(Relation onerel, double totalrows,
-					AnlIndexData *indexdata, int nindexes,
+					AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 					HeapTuple *rows, int numrows,
 					MemoryContext col_context)
 {
@@ -847,6 +960,7 @@ compute_index_stats(Relation onerel, double totalrows,
 	{
 		AnlIndexData *thisdata = &indexdata[ind];
 		IndexInfo  *indexInfo = thisdata->indexInfo;
+		Relation	indexRel = indexRels[ind];
 		int			attr_cnt = thisdata->attr_cnt;
 		TupleTableSlot *slot;
 		EState	   *estate;
@@ -859,6 +973,13 @@ compute_index_stats(Relation onerel, double totalrows,
 					rowno;
 		double		totalindexrows;
 
+		/*
+		 * If this is a BRIN index, try calling a procedure to collect
+		 * extra opfamily-specific statistics (if procedure defined).
+		 */
+		compute_indexam_stats(onerel, indexRel, indexInfo, totalrows,
+							  thisdata, rows, numrows);
+
 		/* Ignore index if no columns to analyze and not partial */
 		if (attr_cnt == 0 && indexInfo->ii_Predicate == NIL)
 			continue;
@@ -1661,6 +1782,13 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+
+		/* optional AM-specific stats */
+		if (DatumGetPointer(stats->staindexam) != NULL)
+			values[Anum_pg_statistic_staindexam - 1] = stats->staindexam;
+		else
+			nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 69e0fb98f5b..9f640adb13c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7715,6 +7715,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	Relation	indexRel;
 	ListCell   *l;
 	VariableStatData vardata;
+	double		averageOverlaps;
 
 	Assert(rte->rtekind == RTE_RELATION);
 
@@ -7762,6 +7763,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 * correlation statistics, we will keep it as 0.
 	 */
 	*indexCorrelation = 0;
+	averageOverlaps = 0.0;
 
 	foreach(l, path->indexclauses)
 	{
@@ -7771,6 +7773,36 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		/* attempt to lookup stats in relation for this index column */
 		if (attnum != 0)
 		{
+			/*
+			 * If AM-specific statistics are enabled, try looking up the stats
+			 * for the index key. We only have this for minmax opclasses, so
+			 * we just cast it like that. But other BRIN opclasses might need
+			 * other stats so either we need to abstract this somehow, or maybe
+			 * just collect a sufficiently generic stats for all BRIN indexes.
+			 *
+			 * XXX Make this non-minmax specific.
+			 */
+			if (enable_indexam_stats)
+			{
+				BrinMinmaxStats  *amstats
+					= (BrinMinmaxStats *) get_attindexam(index->indexoid, attnum);
+
+				if (amstats)
+				{
+					elog(DEBUG1, "found AM stats: attnum %d n_ranges %ld n_summarized %ld n_all_nulls %ld n_has_nulls %ld avg_overlaps %f",
+						 attnum, amstats->n_ranges, amstats->n_summarized,
+						 amstats->n_all_nulls, amstats->n_has_nulls,
+						 amstats->avg_overlaps);
+
+					/*
+					 * The only thing we use at the moment is the average number
+					 * of overlaps for a single range. Use the other stuff too.
+					 */
+					averageOverlaps = Max(averageOverlaps,
+										  1.0 + amstats->avg_overlaps);
+				}
+			}
+
 			/* Simple variable -- look to stats for the underlying table */
 			if (get_relation_stats_hook &&
 				(*get_relation_stats_hook) (root, rte, attnum, &vardata))
@@ -7851,6 +7883,14 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 											 baserel->relid,
 											 JOIN_INNER, NULL);
 
+	/*
+	 * XXX Can we combine qualSelectivity with the average number of matching
+	 * ranges per value? qualSelectivity estimates how many tuples ar we
+	 * going to match, and average number of matches says how many ranges
+	 * will each of those match on average. We don't know how many will
+	 * be duplicate, but it gives us a worst-case estimate, at least.
+	 */
+
 	/*
 	 * Now calculate the minimum possible ranges we could match with if all of
 	 * the rows were in the perfect order in the table's heap.
@@ -7867,6 +7907,25 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	else
 		estimatedRanges = Min(minimalRanges / *indexCorrelation, indexRanges);
 
+	elog(DEBUG1, "before index AM stats: cestimatedRanges = %f", estimatedRanges);
+
+	/*
+	 * If we found some AM stats, look at average number of overlapping ranges,
+	 * and apply that to the currently estimated ranges.
+	 *
+	 * XXX We pretty much combine this with correlation info (because it was
+	 * already applied in the estimatedRanges formula above), which might be
+	 * overly pessimistic. The overlaps stats seems somewhat redundant with
+	 * the correlation, so maybe we should do just one? The AM stats seems
+	 * like a more reliable information, because the correlation is not very
+	 * sensitive to outliers, for example. So maybe let's prefer that, and
+	 * only use the correlation as fallback when AM stats are not available?
+	 */
+	if (averageOverlaps > 0.0)
+		estimatedRanges = Min(estimatedRanges * averageOverlaps, indexRanges);
+
+	elog(DEBUG1, "after index AM stats: cestimatedRanges = %f", estimatedRanges);
+
 	/* we expect to visit this portion of the table */
 	selec = estimatedRanges / indexRanges;
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index a16a63f4957..1725f5af347 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -3138,6 +3138,47 @@ get_attavgwidth(Oid relid, AttrNumber attnum)
 	return 0;
 }
 
+
+/*
+ * get_attstaindexam
+ *
+ *	  Given the table and attribute number of a column, get the index AM
+ *	  statistics.  Return NULL if no data available.
+ *
+ * Currently this is only consulted for individual tables, not for inheritance
+ * trees, so we don't need an "inh" parameter.
+ */
+bytea *
+get_attindexam(Oid relid, AttrNumber attnum)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache3(STATRELATTINH,
+						 ObjectIdGetDatum(relid),
+						 Int16GetDatum(attnum),
+						 BoolGetDatum(false));
+	if (HeapTupleIsValid(tp))
+	{
+		Datum	val;
+		bytea  *retval = NULL;
+		bool	isnull;
+
+		val = SysCacheGetAttr(STATRELATTINH, tp,
+							  Anum_pg_statistic_staindexam,
+							  &isnull);
+
+		if (!isnull)
+			retval = (bytea *) PG_DETOAST_DATUM(val);
+
+		// staindexam = ((Form_pg_statistic) GETSTRUCT(tp))->staindexam;
+		ReleaseSysCache(tp);
+
+		return retval;
+	}
+
+	return NULL;
+}
+
 /*
  * get_attstatsslot
  *
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 05ab087934c..06dfeb6cd8b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -967,6 +967,16 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexam_stats", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index AM stats."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_indexam_stats,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 1dc674d2305..8437c2f0e71 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -216,6 +216,8 @@ typedef struct IndexAmRoutine
 	uint16		amsupport;
 	/* opclass options support function number or 0 */
 	uint16		amoptsprocnum;
+	/* opclass statistics support function number or 0 */
+	uint16		amstatsprocnum;
 	/* does AM support ORDER BY indexed column's value? */
 	bool		amcanorder;
 	/* does AM support ORDER BY result of an operator on indexed column? */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index 887fb0a5532..a7cccac9c90 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,6 +34,57 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinMinmaxStats
+{
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	int64		n_ranges;
+	int64		n_summarized;
+	int64		n_all_nulls;
+	int64		n_has_nulls;
+	double		avg_overlaps;
+	double		avg_matches;
+	double		avg_matches_unique;
+
+	double		minval_correlation;
+	double		maxval_correlation;
+	int64		minval_ndistinct;
+	int64		maxval_ndistinct;
+} BrinMinmaxStats;
 
 #define BRIN_DEFAULT_PAGES_PER_RANGE	128
 #define BrinGetPagesPerRange(relation) \
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index 25186609272..ee6c6f9b709 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -73,6 +73,7 @@ typedef struct BrinDesc
 #define BRIN_PROCNUM_UNION			4
 #define BRIN_MANDATORY_NPROCS		4
 #define BRIN_PROCNUM_OPTIONS 		5	/* optional */
+#define BRIN_PROCNUM_STATISTICS		6	/* optional */
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 4cc129bebd8..ea3de9bcba1 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -804,6 +804,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -835,6 +837,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -864,6 +868,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -893,6 +899,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -905,6 +913,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -917,6 +927,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1034,6 +1046,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1062,6 +1076,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1110,6 +1126,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1160,6 +1178,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1173,6 +1194,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1261,6 +1285,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1314,6 +1341,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1366,6 +1396,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1436,6 +1468,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1467,6 +1502,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1517,6 +1554,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1530,6 +1570,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1542,6 +1585,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1668,6 +1713,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1721,6 +1769,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1771,6 +1822,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1785,6 +1838,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1799,6 +1855,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1851,6 +1910,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '6', amproc => 'brin_minmax_stats' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1924,6 +1985,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '6',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 62a5b8e655d..1dd9177b01c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8407,6 +8407,10 @@
 { oid => '3386', descr => 'BRIN minmax support',
   proname => 'brin_minmax_union', prorettype => 'bool',
   proargtypes => 'internal internal internal', prosrc => 'brin_minmax_union' },
+{ oid => '9979', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_stats', prorettype => 'bool',
+  proargtypes => 'internal internal int2 int2 internal int4',
+  prosrc => 'brin_minmax_stats' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index cdf74481398..7043b169f7c 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -121,6 +121,11 @@ CATALOG(pg_statistic,2619,StatisticRelationId)
 	anyarray	stavalues3;
 	anyarray	stavalues4;
 	anyarray	stavalues5;
+
+	/*
+	 * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).
+	 */
+	bytea		staindexam;
 #endif
 } FormData_pg_statistic;
 
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 5d816ba7f4e..319f7d4aadc 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -155,6 +155,7 @@ typedef struct VacAttrStats
 	float4	   *stanumbers[STATISTIC_NUM_SLOTS];
 	int			numvalues[STATISTIC_NUM_SLOTS];
 	Datum	   *stavalues[STATISTIC_NUM_SLOTS];
+	Datum		staindexam;		/* index-specific stats (as bytea) */
 
 	/*
 	 * These fields describe the stavalues[n] element types. They will be
@@ -258,6 +259,7 @@ extern PGDLLIMPORT int vacuum_multixact_freeze_min_age;
 extern PGDLLIMPORT int vacuum_multixact_freeze_table_age;
 extern PGDLLIMPORT int vacuum_failsafe_age;
 extern PGDLLIMPORT int vacuum_multixact_failsafe_age;
+extern PGDLLIMPORT bool enable_indexam_stats;
 
 /* Variables for cost-based parallel vacuum */
 extern PGDLLIMPORT pg_atomic_uint32 *VacuumSharedCostBalance;
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 50f02883052..71ce5b15d74 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -185,6 +185,7 @@ extern Oid	getBaseType(Oid typid);
 extern Oid	getBaseTypeAndTypmod(Oid typid, int32 *typmod);
 extern int32 get_typavgwidth(Oid typid, int32 typmod);
 extern int32 get_attavgwidth(Oid relid, AttrNumber attnum);
+extern bytea *get_attindexam(Oid relid, AttrNumber attnum);
 extern bool get_attstatsslot(AttStatsSlot *sslot, HeapTuple statstuple,
 							 int reqkind, Oid reqop, int flags);
 extern void free_attstatsslot(AttStatsSlot *sslot);
-- 
2.25.1

0002-f-Allow-index-AMs-to-build-and-use-custom-statistics.patchtext/x-diff; charset=us-asciiDownload

From 42c8bc879e11ef816e50751e149f116112dc9905 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 23 Oct 2022 12:29:33 -0500
Subject: [PATCH 2/4] f!Allow index AMs to build and use custom statistics

XXX: should enable GUC for CI during development

ci-os-only: windows-cross, windows-run-cross, windows-msvc
---
 src/backend/access/brin/brin_minmax.c         | 10 ++--
 src/backend/statistics/extended_stats.c       |  2 +
 src/backend/utils/adt/selfuncs.c              |  6 +--
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/brin_internal.h            |  2 +-
 src/include/catalog/pg_amproc.dat             | 52 +++++++++----------
 src/test/regress/expected/sysviews.out        |  3 +-
 7 files changed, 41 insertions(+), 35 deletions(-)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index e4c9e56c623..be1d9b47d5b 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -842,9 +842,11 @@ brin_minmax_count_overlaps_bruteforce(BrinRanges *ranges, TypeCacheEntry *typcac
 			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
 				continue;
 
+#if 0
 			elog(DEBUG1, "[%ld,%ld] overlaps [%ld,%ld]",
 				 ra->min_value, ra->max_value,
 				 rb->min_value, rb->max_value);
+#endif
 
 			noverlaps++;
 		}
@@ -1173,11 +1175,11 @@ brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
 
 #ifdef STATS_DEBUG
 	elog(WARNING, "----- brin_minmax_value_stats -----");
-	elog(WARNING, "minval ndistinct %ld correlation %f",
-		 *minval_ndistinct, *minval_correlation);
+	elog(WARNING, "minval ndistinct %lld correlation %f",
+		 (long long)*minval_ndistinct, *minval_correlation);
 
-	elog(WARNING, "maxval ndistinct %ld correlation %f",
-		 *maxval_ndistinct, *maxval_correlation);
+	elog(WARNING, "maxval ndistinct %lld correlation %f",
+		 (long long)*maxval_ndistinct, *maxval_correlation);
 #endif
 }
 
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index ab97e71dd79..d91b4fd93eb 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -2370,6 +2370,8 @@ serialize_expr_stats(AnlExprData *exprdata, int nexprs)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+		nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 9f640adb13c..14e0885f19f 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7789,9 +7789,9 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 
 				if (amstats)
 				{
-					elog(DEBUG1, "found AM stats: attnum %d n_ranges %ld n_summarized %ld n_all_nulls %ld n_has_nulls %ld avg_overlaps %f",
-						 attnum, amstats->n_ranges, amstats->n_summarized,
-						 amstats->n_all_nulls, amstats->n_has_nulls,
+					elog(DEBUG1, "found AM stats: attnum %d n_ranges %lld n_summarized %lld n_all_nulls %lld n_has_nulls %lld avg_overlaps %f",
+						 attnum, (long long)amstats->n_ranges, (long long)amstats->n_summarized,
+						 (long long)amstats->n_all_nulls, (long long)amstats->n_has_nulls,
 						 amstats->avg_overlaps);
 
 					/*
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351e..8c5d442ff45 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -374,6 +374,7 @@
 #enable_hashagg = on
 #enable_hashjoin = on
 #enable_incremental_sort = on
+#enable_indexam_stats = off
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index ee6c6f9b709..f4be357c176 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -73,9 +73,9 @@ typedef struct BrinDesc
 #define BRIN_PROCNUM_UNION			4
 #define BRIN_MANDATORY_NPROCS		4
 #define BRIN_PROCNUM_OPTIONS 		5	/* optional */
-#define BRIN_PROCNUM_STATISTICS		6	/* optional */
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
+#define BRIN_PROCNUM_STATISTICS		11	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index ea3de9bcba1..558df53206d 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -805,7 +805,7 @@
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
-  amprocrighttype => 'bytea', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -838,7 +838,7 @@
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
-  amprocrighttype => 'char', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -869,7 +869,7 @@
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
-  amprocrighttype => 'name', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -900,7 +900,7 @@
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
-  amprocrighttype => 'int8', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -914,7 +914,7 @@
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
-  amprocrighttype => 'int2', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -928,7 +928,7 @@
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
-  amprocrighttype => 'int4', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1047,7 +1047,7 @@
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
-  amprocrighttype => 'text', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1077,7 +1077,7 @@
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
-  amprocrighttype => 'oid', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1127,7 +1127,7 @@
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
-  amprocrighttype => 'tid', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1179,7 +1179,7 @@
   amprocrighttype => 'float4', amprocnum => '4',
   amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
-  amprocrighttype => 'float4', amprocnum => '6',
+  amprocrighttype => 'float4', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
@@ -1195,7 +1195,7 @@
   amprocrighttype => 'float8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
-  amprocrighttype => 'float8', amprocnum => '6',
+  amprocrighttype => 'float8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 
 # minmax multi float
@@ -1286,7 +1286,7 @@
   amprocrighttype => 'macaddr', amprocnum => '4',
   amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
-  amprocrighttype => 'macaddr', amprocnum => '6',
+  amprocrighttype => 'macaddr', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr
@@ -1342,7 +1342,7 @@
   amprocrighttype => 'macaddr8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
-  amprocrighttype => 'macaddr8', amprocnum => '6',
+  amprocrighttype => 'macaddr8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr8
@@ -1397,7 +1397,7 @@
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
-  amprocrighttype => 'inet', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1469,7 +1469,7 @@
   amprocrighttype => 'bpchar', amprocnum => '4',
   amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
-  amprocrighttype => 'bpchar', amprocnum => '6',
+  amprocrighttype => 'bpchar', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 
 # bloom character
@@ -1503,7 +1503,7 @@
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
-  amprocrighttype => 'time', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1555,7 +1555,7 @@
   amprocrighttype => 'timestamp', amprocnum => '4',
   amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
-  amprocrighttype => 'timestamp', amprocnum => '6',
+  amprocrighttype => 'timestamp', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
@@ -1571,7 +1571,7 @@
   amprocrighttype => 'timestamptz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
-  amprocrighttype => 'timestamptz', amprocnum => '6',
+  amprocrighttype => 'timestamptz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
@@ -1586,7 +1586,7 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
-  amprocrighttype => 'date', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1714,7 +1714,7 @@
   amprocrighttype => 'interval', amprocnum => '4',
   amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
-  amprocrighttype => 'interval', amprocnum => '6',
+  amprocrighttype => 'interval', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 
 # minmax multi interval
@@ -1770,7 +1770,7 @@
   amprocrighttype => 'timetz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
-  amprocrighttype => 'timetz', amprocnum => '6',
+  amprocrighttype => 'timetz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 
 # minmax multi time with time zone
@@ -1823,7 +1823,7 @@
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
-  amprocrighttype => 'bit', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1839,7 +1839,7 @@
   amprocrighttype => 'varbit', amprocnum => '4',
   amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
-  amprocrighttype => 'varbit', amprocnum => '6',
+  amprocrighttype => 'varbit', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 
 # minmax numeric
@@ -1856,7 +1856,7 @@
   amprocrighttype => 'numeric', amprocnum => '4',
   amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
-  amprocrighttype => 'numeric', amprocnum => '6',
+  amprocrighttype => 'numeric', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 
 # minmax multi numeric
@@ -1911,7 +1911,7 @@
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
-  amprocrighttype => 'uuid', amprocnum => '6', amproc => 'brin_minmax_stats' },
+  amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1986,7 +1986,7 @@
   amprocrighttype => 'pg_lsn', amprocnum => '4',
   amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
-  amprocrighttype => 'pg_lsn', amprocnum => '6',
+  amprocrighttype => 'pg_lsn', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 
 # minmax multi pg_lsn
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 579b861d84f..b19dae255e9 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -117,6 +117,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashagg                 | on
  enable_hashjoin                | on
  enable_incremental_sort        | on
+ enable_indexam_stats           | off
  enable_indexonlyscan           | on
  enable_indexscan               | on
  enable_material                | on
@@ -131,7 +132,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(20 rows)
+(21 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.25.1

0003-Allow-BRIN-indexes-to-produce-sorted-output.patchtext/x-diff; charset=us-asciiDownload

From 63ca62c13fa852c12d52ba0c53d801b7992ecb4b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 9 Oct 2022 11:33:37 +0200
Subject: [PATCH 3/4] Allow BRIN indexes to produce sorted output

Some BRIN indexes can be used to produce sorted output, by using the
range information to sort tuples incrementally. This is particularly
interesting for LIMIT queries, which only need to scan the first few
rows, and alternative plans (e.g. Seq Scan + Sort) have a very high
startup cost.

Of course, if there are e.g. BTREE indexes this is going to be slower,
but people are unlikely to have both index types on the same column.

This is disabled by default, use enable_brinsort GUC to enable it.
---
 src/backend/access/brin/brin_minmax.c   |  386 ++++++
 src/backend/commands/explain.c          |   44 +
 src/backend/executor/Makefile           |    1 +
 src/backend/executor/execProcnode.c     |   10 +
 src/backend/executor/nodeBrinSort.c     | 1550 +++++++++++++++++++++++
 src/backend/optimizer/path/costsize.c   |  254 ++++
 src/backend/optimizer/path/indxpath.c   |  186 +++
 src/backend/optimizer/path/pathkeys.c   |   50 +
 src/backend/optimizer/plan/createplan.c |  188 +++
 src/backend/optimizer/plan/setrefs.c    |   19 +
 src/backend/optimizer/util/pathnode.c   |   57 +
 src/backend/utils/misc/guc_tables.c     |   10 +
 src/include/access/brin.h               |   35 -
 src/include/access/brin_internal.h      |    1 +
 src/include/catalog/pg_amproc.dat       |   64 +
 src/include/catalog/pg_proc.dat         |    3 +
 src/include/executor/nodeBrinSort.h     |   47 +
 src/include/nodes/execnodes.h           |  103 ++
 src/include/nodes/pathnodes.h           |   11 +
 src/include/nodes/plannodes.h           |   26 +
 src/include/optimizer/cost.h            |    3 +
 src/include/optimizer/pathnode.h        |    9 +
 src/include/optimizer/paths.h           |    3 +
 23 files changed, 3025 insertions(+), 35 deletions(-)
 create mode 100644 src/backend/executor/nodeBrinSort.c
 create mode 100644 src/include/executor/nodeBrinSort.h

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index be1d9b47d5b..9064cd43852 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -16,6 +16,10 @@
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
 #include "miscadmin.h"
@@ -42,6 +46,9 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 /* calculate the stats in different ways for cross-checking */
 #define STATS_CROSS_CHECK
 
+/* print info about ranges */
+#define BRINSORT_DEBUG
+
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1587,6 +1594,385 @@ cleanup:
 	PG_RETURN_POINTER(stats);
 }
 
+/*
+ * brin_minmax_range_tupdesc
+ *		Create a tuple descriptor to store BrinRange data.
+ */
+static TupleDesc
+brin_minmax_range_tupdesc(BrinDesc *brdesc, AttrNumber attnum)
+{
+	TupleDesc	tupdesc;
+	AttrNumber	attno = 1;
+
+	/* expect minimum and maximum */
+	Assert(brdesc->bd_info[attnum - 1]->oi_nstored == 2);
+
+	tupdesc = CreateTemplateTupleDesc(7);
+
+	/* blkno_start */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT8OID, -1, 0);
+
+	/* blkno_end (could be calculated as blkno_start + pages_per_range) */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT8OID, -1, 0);
+
+	/* has_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* all_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* not_summarized */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* min_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	/* max_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	return tupdesc;
+}
+
+/*
+ * brin_minmax_range_tuple
+ *		Form a minimal tuple representing range info.
+ */
+static MinimalTuple
+brin_minmax_range_tuple(TupleDesc tupdesc,
+						BlockNumber block_start, BlockNumber block_end,
+						bool has_nulls, bool all_nulls, bool not_summarized,
+						Datum min_value, Datum max_value)
+{
+	Datum	values[7];
+	bool	nulls[7];
+
+	memset(nulls, 0, sizeof(nulls));
+
+	values[0] = Int64GetDatum(block_start);
+	values[1] = Int64GetDatum(block_end);
+	values[2] = BoolGetDatum(has_nulls);
+	values[3] = BoolGetDatum(all_nulls);
+	values[4] = BoolGetDatum(not_summarized);
+	values[5] = min_value;
+	values[6] = max_value;
+
+	if (all_nulls || not_summarized)
+	{
+		nulls[5] = true;
+		nulls[6] = true;
+	}
+
+	return heap_form_minimal_tuple(tupdesc, values, nulls);
+}
+
+/*
+ * brin_minmax_scan_init
+ *		Prepare the BrinRangeScanDesc including the sorting info etc.
+ *
+ * We want to have the ranges in roughly this order
+ *
+ * - not-summarized
+ * - summarized, non-null values
+ * - summarized, all-nulls
+ *
+ * We do it this way, because the not-summarized ranges need to be
+ * scanned always (both to produce NULL and non-NULL values), and
+ * we need to read all of them into the tuplesort before producing
+ * anything. So placing them at the beginning is reasonable.
+ *
+ * The all-nulls ranges are placed last, because when processing
+ * NULLs we need to scan everything anyway (some of the ranges might
+ * have has_nulls=true). But for non-NULL values we can abort once
+ * we hit the first all-nulls range.
+ *
+ * The regular ranges are sorted by blkno_start, to make it maybe
+ * a bit more sequential (but this only helps if there are ranges
+ * with the same minval).
+ */
+static BrinRangeScanDesc *
+brin_minmax_scan_init(BrinDesc *bdesc, AttrNumber attnum, bool asc)
+{
+	BrinRangeScanDesc  *scan;
+
+	/* sort by (not_summarized, minval, blkno_start, all_nulls) */
+	AttrNumber			keys[4];
+	Oid					collations[4];
+	bool				nullsFirst[4];
+	Oid					operators[4];
+	Oid					typid;
+	TypeCacheEntry	   *typcache;
+
+	/* we expect to have min/max value for each range, same type for both */
+	Assert(bdesc->bd_info[attnum - 1]->oi_nstored == 2);
+	Assert(bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id ==
+		   bdesc->bd_info[attnum - 1]->oi_typcache[1]->type_id);
+
+	scan = (BrinRangeScanDesc *) palloc0(sizeof(BrinRangeScanDesc));
+
+	/* build tuple descriptor for range data */
+	scan->tdesc = brin_minmax_range_tupdesc(bdesc, attnum);
+
+	/* initialize ordering info */
+	keys[0] = 5;				/* not_summarized */
+	keys[1] = 4;				/* all_nulls */
+	keys[2] = (asc) ? 6 : 7;	/* min_value (asc) or max_value (desc) */
+	keys[3] = 1;				/* blkno_start */
+
+	collations[0] = InvalidOid;	/* FIXME */
+	collations[1] = InvalidOid;	/* FIXME */
+	collations[2] = InvalidOid;	/* FIXME */
+	collations[3] = InvalidOid;	/* FIXME */
+
+	/* unrelated to the ordering desired by the user */
+	nullsFirst[0] = false;
+	nullsFirst[1] = false;
+	nullsFirst[2] = false;
+	nullsFirst[3] = false;
+
+	/* lookup sort operator for the boolean type (used for not_summarized) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_GT_OPR);
+	operators[0] = typcache->gt_opr;
+
+	/* lookup sort operator for the boolean type (used for all_nulls) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_LT_OPR);
+	operators[1] = typcache->lt_opr;
+
+	/* lookup sort operator for the min/max type */
+	typid = bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id;
+	typcache = lookup_type_cache(typid, TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+	operators[2] = (asc) ? typcache->lt_opr : typcache->gt_opr;
+
+	/* lookup sort operator for the bigint type (used for blkno_start) */
+	typcache = lookup_type_cache(INT8OID, TYPECACHE_LT_OPR);
+	operators[3] = typcache->lt_opr;
+
+	scan->ranges = tuplesort_begin_heap(scan->tdesc,
+										4, /* nkeys */
+										keys,
+										operators,
+										collations,
+										nullsFirst,
+										work_mem,
+										NULL,
+										TUPLESORT_RANDOMACCESS);
+
+	scan->slot = MakeSingleTupleTableSlot(scan->tdesc,
+										  &TTSOpsMinimalTuple);
+
+	return scan;
+}
+
+/*
+ * brin_minmax_scan_add_tuple
+ *		Form and store a tuple representing the BRIN range to the tuplestore.
+ */
+static void
+brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan,
+						  BlockNumber block_start, BlockNumber block_end,
+						  bool has_nulls, bool all_nulls, bool not_summarized,
+						  Datum min_value, Datum max_value)
+{
+	MinimalTuple tup;
+
+	tup = brin_minmax_range_tuple(scan->tdesc, block_start, block_end,
+								  has_nulls, all_nulls, not_summarized,
+								  min_value, max_value);
+
+	ExecStoreMinimalTuple(tup, scan->slot, false);
+
+	tuplesort_puttupleslot(scan->ranges, scan->slot);
+}
+
+#ifdef BRINSORT_DEBUG
+/*
+ * brin_minmax_scan_next
+ *		Return the next BRIN range information from the tuplestore.
+ *
+ * Returns NULL when there are no more ranges.
+ */
+static BrinRange *
+brin_minmax_scan_next(BrinRangeScanDesc *scan)
+{
+	if (tuplesort_gettupleslot(scan->ranges, true, false, scan->slot, NULL))
+	{
+		bool		isnull;
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = slot_getattr(scan->slot, 1, &isnull);
+		range->blkno_end = slot_getattr(scan->slot, 2, &isnull);
+		range->has_nulls = slot_getattr(scan->slot, 3, &isnull);
+		range->all_nulls = slot_getattr(scan->slot, 4, &isnull);
+		range->not_summarized = slot_getattr(scan->slot, 5, &isnull);
+		range->min_value = slot_getattr(scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(scan->slot, 7, &isnull);
+
+		return range;
+	}
+
+	return NULL;
+}
+
+/*
+ * brin_minmax_scan_dump
+ *		Print info about all page ranges stored in the tuplestore.
+ */
+static void
+brin_minmax_scan_dump(BrinRangeScanDesc *scan)
+{
+	BrinRange *range;
+
+	elog(WARNING, "===== dumping =====");
+	while ((range = brin_minmax_scan_next(scan)) != NULL)
+	{
+		elog(WARNING, "[%u %u] has_nulls %d all_nulls %d not_summarized %d values [%f %f]",
+			 range->blkno_start, range->blkno_end,
+			 range->has_nulls, range->all_nulls, range->not_summarized,
+			 DatumGetFloat8(range->min_value), DatumGetFloat8(range->max_value));
+
+		pfree(range);
+	}
+
+	/* reset the tuplestore, so that we can start scanning again */
+	tuplesort_rescan(scan->ranges);
+}
+#endif
+
+static void
+brin_minmax_scan_finalize(BrinRangeScanDesc *scan)
+{
+	tuplesort_performsort(scan->ranges);
+}
+
+/*
+ * brin_minmax_ranges
+ *		Load the BRIN ranges and sort them.
+ */
+Datum
+brin_minmax_ranges(PG_FUNCTION_ARGS)
+{
+	IndexScanDesc	scan = (IndexScanDesc) PG_GETARG_POINTER(0);
+	AttrNumber		attnum = PG_GETARG_INT16(1);
+	bool			asc = PG_GETARG_BOOL(2);
+	BrinOpaque *opaque;
+	Relation	indexRel;
+	Relation	heapRel;
+	BlockNumber nblocks;
+	BlockNumber	heapBlk;
+	Oid			heapOid;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinRangeScanDesc *brscan;
+
+	/*
+	 * Determine how many BRIN ranges could there be, allocate space and read
+	 * all the min/max values.
+	 */
+	opaque = (BrinOpaque *) scan->opaque;
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	indexRel = bdesc->bd_index;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	heapOid = IndexGetRelation(RelationGetRelid(indexRel), false);
+	heapRel = table_open(heapOid, AccessShareLock);
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+	table_close(heapRel, AccessShareLock);
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* initialize the scan describing scan of ranges sorted by minval */
+	brscan = brin_minmax_scan_init(bdesc, attnum, asc);
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   scan->xs_snapshot);
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/*
+		 * Ranges with no indexed tuple may contain anything.
+		 */
+		if (!gottuple)
+		{
+			brin_minmax_scan_add_tuple(brscan,
+									   heapBlk, heapBlk + (pagesPerRange - 1),
+									   false, false, true, 0, 0);
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/*
+				 * Placeholder tuples are treated as if not summarized.
+				 *
+				 * XXX Is this correct?
+				 */
+				brin_minmax_scan_add_tuple(brscan,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   false, false, true, 0, 0);
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				brin_minmax_scan_add_tuple(brscan,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   bval->bv_hasnulls, bval->bv_allnulls, false,
+										   bval->bv_values[0], bval->bv_values[1]);
+			}
+		}
+	}
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* do the sort and any necessary post-processing */
+	brin_minmax_scan_finalize(brscan);
+
+#ifdef BRINSORT_DEBUG
+	brin_minmax_scan_dump(brscan);
+#endif
+
+	PG_RETURN_POINTER(brscan);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c6601..e15b29246b1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -85,6 +85,8 @@ static void show_sort_keys(SortState *sortstate, List *ancestors,
 						   ExplainState *es);
 static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
+static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
+							   ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1100,6 +1102,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexScan:
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
+		case T_BrinSort:
 		case T_TidScan:
 		case T_TidRangeScan:
 		case T_SubqueryScan:
@@ -1262,6 +1265,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_IndexOnlyScan:
 			pname = sname = "Index Only Scan";
 			break;
+		case T_BrinSort:
+			pname = sname = "BRIN Sort";
+			break;
 		case T_BitmapIndexScan:
 			pname = sname = "Bitmap Index Scan";
 			break;
@@ -1508,6 +1514,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainScanTarget((Scan *) indexonlyscan, es);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *brinsort = (BrinSort *) plan;
+
+				ExplainIndexScanDetails(brinsort->indexid,
+										brinsort->indexorderdir,
+										es);
+				ExplainScanTarget((Scan *) brinsort, es);
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *bitmapindexscan = (BitmapIndexScan *) plan;
@@ -1790,6 +1806,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainPropertyFloat("Heap Fetches", NULL,
 									 planstate->instrument->ntuples2, 0, es);
 			break;
+		case T_BrinSort:
+			show_scan_qual(((BrinSort *) plan)->indexqualorig,
+						   "Index Cond", planstate, ancestors, es);
+			if (((BrinSort *) plan)->indexqualorig)
+				show_instrumentation_count("Rows Removed by Index Recheck", 2,
+										   planstate, es);
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			break;
 		case T_BitmapIndexScan:
 			show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
@@ -2389,6 +2417,21 @@ show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 						 ancestors, es);
 }
 
+/*
+ * Show the sort keys for a BRIN Sort node.
+ */
+static void
+show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSort	   *plan = (BrinSort *) sortstate->ss.ps.plan;
+
+	show_sort_group_keys((PlanState *) sortstate, "Sort Key",
+						 plan->numCols, 0, plan->sortColIdx,
+						 plan->sortOperators, plan->collations,
+						 plan->nullsFirst,
+						 ancestors, es);
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
@@ -3812,6 +3855,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
+		case T_BrinSort:
 			/* Assert it's on a real relation */
 			Assert(rte->rtekind == RTE_RELATION);
 			objectname = get_rel_name(rte->relid);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 11118d0ce02..bcaa2ce8e21 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -38,6 +38,7 @@ OBJS = \
 	nodeBitmapHeapscan.o \
 	nodeBitmapIndexscan.o \
 	nodeBitmapOr.o \
+	nodeBrinSort.o \
 	nodeCtescan.o \
 	nodeCustom.o \
 	nodeForeignscan.o \
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 36406c3af57..4a6dc3f263c 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -79,6 +79,7 @@
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
+#include "executor/nodeBrinSort.h"
 #include "executor/nodeCtescan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -226,6 +227,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														 estate, eflags);
 			break;
 
+		case T_BrinSort:
+			result = (PlanState *) ExecInitBrinSort((BrinSort *) node,
+													estate, eflags);
+			break;
+
 		case T_BitmapIndexScan:
 			result = (PlanState *) ExecInitBitmapIndexScan((BitmapIndexScan *) node,
 														   estate, eflags);
@@ -639,6 +645,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIndexOnlyScan((IndexOnlyScanState *) node);
 			break;
 
+		case T_BrinSortState:
+			ExecEndBrinSort((BrinSortState *) node);
+			break;
+
 		case T_BitmapIndexScanState:
 			ExecEndBitmapIndexScan((BitmapIndexScanState *) node);
 			break;
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
new file mode 100644
index 00000000000..ca72c1ed22d
--- /dev/null
+++ b/src/backend/executor/nodeBrinSort.c
@@ -0,0 +1,1550 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.c
+ *	  Routines to support sorted scan of relations using a BRIN index
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * The overall algorithm is roughly this:
+ *
+ * 0) initialize a tuplestore and a tuplesort
+ *
+ * 1) fetch list of page ranges from the BRIN index, sorted by minval
+ *    (with the not-summarized ranges first, and all-null ranges last)
+ *
+ * 2) for NULLS FIRST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ * 3) while there are ranges in the list, do this:
+ *
+ *   a) get next (distinct) minval from the list, call it watermark
+ *
+ *   b) if there are any tuples in the tuplestore, move them to tuplesort
+ *
+ *   c) process all ranges with (minval < watermark) - read tuples and feed
+ *      them either into tuplestore (when value < watermark) or tuplestore
+ *
+ *   d) sort the tuplestore, output all the tuples
+ *
+ * 4) if some tuples remain in the tuplestore, sort and output them
+ *
+ * 5) for NULLS LAST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ *
+ * For DESC orderings the process is almost the same, except that we look
+ * at maxval and use '>' operator (but that's transparent).
+ *
+ * There's a couple possible things that might be done in different ways:
+ *
+ * 1) Not using tuplestore, and feeding tuples only to a tuplesort. Then
+ * while producing the tuples, we'd only output tuples up to the current
+ * watermark, and then we'd keep the remaining tuples for the next round.
+ * Either we'd need to transfer them into a second tuplesort, or allow
+ * "reopening" the tuplesort and adding more tuples. And then only the
+ * part since the watermark would get sorted (possibly using a merge-sort
+ * with the already sorted part).
+ *
+ *
+ * 2) The other question is what to do with NULL values - at the moment we
+ * just read the ranges, output the NULL tuples and that's it - we're not
+ * retaining any non-NULL tuples, so that we'll read the ranges again in
+ * the second range. The logic here is that either there are very few
+ * such ranges, so it's won't cost much to just re-read them. Or maybe
+ * there are very many such ranges, and we'd do a lot of spilling to the
+ * tuplestore, and it's not much more expensive to just re-read the source
+ * data. There are counter-examples, though - e.g., there might be many
+ * has_nulls ranges, but with very few non-NULL tuples. In this case it
+ * might be better to actually spill the tuples instead of re-reading all
+ * the ranges. Maybe this is something we can do at run-time, or maybe we
+ * could estimate this at planning time. We do know the null_frac for the
+ * column, so we know the number of NULL rows. And we also know the number
+ * of all_nulls and has_nulls ranges. We can estimate the number of rows
+ * per range, and we can estimate how many non-NULL rows are in the
+ * has_nulls ranges (we don't need to re-read all-nulls ranges). There's
+ * also the filter, which may reduce the amount of rows to store.
+ *
+ * So we'd need to compare two metrics calculated roughly like this:
+ *
+ *   cost(re-reading has-nulls ranges)
+ *      = cost(random_page_cost * n_has_nulls + seq_page_cost * pages_per_range)
+ *
+ *   cost(spilling non-NULL rows from has-nulls ranges)
+ *      = cost(numrows * width / BLCKSZ * seq_page_cost * 2)
+ *
+ * where numrows is the number of non-NULL rows in has_null ranges, which
+ * can be calculated like this:
+ *
+ *   // estimated number of rows in has-null ranges
+ *   rows_in_has_nulls = (reltuples / relpages) * pages_per_range * n_has_nulls
+ *
+ *   // number of NULL rows in the has-nulls ranges
+ *   nulls_in_ranges = reltuples * null_frac - n_all_nulls * (reltuples / relpages)
+ *
+ *   // numrows is the difference, multiplied by selectivity of the index
+ *   // filter condition (value between 0.0 and 1.0)
+ *   numrows = (rows_in_has_nulls - nulls_in_ranges) * selectivity
+ *
+ * This ignores non-summarized ranges, but there should be only very few of
+ * those, so it should not make a huge difference. Otherwise we can divide
+ * them between regular, has-nulls and all-nulls pages to keep the ratio.
+ *
+ *
+ * 3) How large step to make when updating the watermark?
+ *
+ * When updating the watermark, one option is to simply proceed to the next
+ * distinct minval value, which is the smallest possible step we can make.
+ * This may be both fine and very inefficient, depending on how many rows
+ * end up in the tuplesort and how many rows we end up spilling (possibly
+ * repeatedly to the tuplestore).
+ *
+ * When having to sort large number of rows, it's inefficient to run many
+ * tiny sorts, even if it produces correct result. For example when sorting
+ * 1M rows, we may split this as either (a) 100000x sorts of 10 rows, or
+ * (b) 1000 sorts of 1000 rows. The (b) option is almost certainly more
+ * efficient. Maybe sorts of 10k rows would be even better, if it fits
+ * into work_mem.
+ *
+ * This gets back to how large the page ranges are, and if/how much they
+ * overlap. With tiny ranges (e.g. a single-page ranges), a single range
+ * can only add as many rows as we can fit on a single page. So we need
+ * more ranges by default - how many watermark steps that is depends on
+ * how many distinct minval values are there ...
+ *
+ * Then there's overlaps - if ranges do not overlap, we're done and we'll
+ * add the whole range because the next watermark is above maxval. But
+ * when the ranges overlap, we'll only add the first part (assuming the
+ * minval of the next range is the watermark). Assume 10 overlapping
+ * ranges - imagine for example ranges shifted by 10%, so something like
+ *
+ *   [0,100] [10,110], [20,120], [30, 130], ..., [90, 190]
+ *
+ * In the first step we use watermark=10 and load the first range, with
+ * maybe 1000 rows in total. But assuming uniform distribution, only about
+ * 100 rows will go into the tuplesort, the remaining 900 rows will go into
+ * the tuplestore (assuming uniform distribution). Then in the second step
+ * we sort another 100 rows and the remaining 800 rows will be moved into
+ * a new tuplestore. And so on and so on.
+ *
+ * This means that incrementing the watermarks by single steps may be
+ * quite inefficient, and we need to reflect both the range size and
+ * how much the ranges overlap.
+ *
+ * In fact, maybe we should not determine the step as number of minval
+ * values to skip, but how many ranges would that mean reading. Because
+ * if we have a minval with many duplicates, that may load many rows.
+ * Or even better, we could look at how many rows would that mean loading
+ * into the tuplestore - if we track P(x<minval) for each range (e.g. by
+ * calculating average value during ANALYZE, or perhaps by estimating
+ * it from per-column stats), then we know the increment is going to be
+ * about
+ *
+ *     P(x < minval[i]) - P(x < minval[i-1])
+ *
+ * and we can stop once we'd exceed work_mem (with some slack). See comment
+ * for brin_minmax_stats() for more thoughts.
+ *
+ *
+ * 4) LIMIT/OFFSET vs. full sort
+ *
+ * There's one case where very small sorts may be actually optimal, and
+ * that's queries that need to process only very few rows - say, LIMIT
+ * queries with very small bound.
+ *
+ *
+ * FIXME Projection does not work (fails on projection slot expecting
+ * buffer ops, but we're sending it minimal tuple slot).
+ *
+ * FIXME The tlists are not wired quite correctly - the sortColIdx is an
+ * index to the tlist, but we need attnum from the heap table, so that we
+ * can fetch the attribute etc. Or maybe fetching the value from the raw
+ * tuple (before projection) is wrong and needs to be done differently.
+ *
+ * FIXME Indexes on expressions don't work (possibly related to the tlist
+ * being done incorrectly).
+ *
+ * FIXME handling of other brin opclasses (minmax-multi)
+ *
+ * FIXME improve costing
+ *
+ *
+ * Improvement ideas:
+ *
+ * 1) multiple tuplestores for overlapping ranges
+ *
+ * When there are many overlapping ranges (so that maxval > current.maxval),
+ * we're loading all the "future" tuples into a new tuplestore. However, if
+ * there are multiple such ranges (imagine ranges "shifting" by 10%, which
+ * gives us 9 more ranges), we know in the next round we'll only need rows
+ * until the next maxval. We'll not sort these rows, but we'll still shuffle
+ * them around until we get to the proper range (so about 10x each row).
+ * Maybe we should pre-allocate the tuplestores (or maybe even tuplesorts)
+ * for future ranges, and route the tuples to the correct one? Maybe we
+ * could be a bit smarter and discard tuples once we have enough rows for
+ * the preceding ranges (say, with LIMIT queries). We'd also need to worry
+ * about work_mem, though - we can't just use many tuplestores, each with
+ * whole work_mem. So we'd probably use e.g. work_mem/2 for the next one,
+ * and then /4, /8 etc. for the following ones. That's work_mem in total.
+ * And there'd need to be some limit on number of tuplestores, I guess.
+ *
+ * 2) handling NULL values
+ *
+ * We need to handle NULLS FIRST / NULLS LAST cases. The question is how
+ * to do that - the easiest way is to simply do a separate scan of ranges
+ * that might contain NULL values, processing just rows with NULLs, and
+ * discarding other rows. And then process non-NULL values as currently.
+ * The NULL scan would happen before/after this regular phase.
+ *
+ * Byt maybe we could be smarter, and not do separate scans. When reading
+ * a page, we might stash the tuple in a tuplestore, so that we can read
+ * it the next round. Obviously, this might be expensive if we need to
+ * keep too many rows, so the tuplestore would grow too large - in that
+ * case it might be better to just do the two scans.
+ *
+ * 3) parallelism
+ *
+ * Presumably we could do a parallel version of this. The leader or first
+ * worker would prepare the range information, and the workers would then
+ * grab ranges (in a kinda round robin manner), sort them independently,
+ * and then the results would be merged by Gather Merge.
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeBrinSort.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *		ExecBrinSort			scans a relation using an index
+ *		IndexNext				retrieve next tuple using index
+ *		ExecInitBrinSort		creates and initializes state info.
+ *		ExecReScanBrinSort		rescans the indexed relation.
+ *		ExecEndBrinSort			releases all storage.
+ *		ExecBrinSortMarkPos		marks scan position.
+ *		ExecBrinSortRestrPos	restores scan position.
+ *		ExecBrinSortEstimate	estimates DSM space needed for parallel index scan
+ *		ExecBrinSortInitializeDSM initialize DSM for parallel BrinSort
+ *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
+ *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
+ */
+#include "postgres.h"
+
+#include "access/brin.h"
+#include "access/brin_internal.h"
+#include "access/nbtree.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
+#include "executor/execdebug.h"
+#include "executor/nodeBrinSort.h"
+#include "lib/pairingheap.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/array.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+
+static TupleTableSlot *IndexNext(BrinSortState *node);
+static bool IndexRecheck(BrinSortState *node, TupleTableSlot *slot);
+static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
+
+#define BRINSORT_DEBUG
+
+/* do various consistency checks */
+static void
+AssertCheckRanges(BrinSortState *node)
+{
+#ifdef USE_ASSERT_CHECKING
+
+#endif
+}
+
+/*
+ * brinsort_start_tidscan
+ *		Start scanning tuples from a given page range.
+ *
+ * We open a TID range scan for the given range, and initialize the tuplesort.
+ * Optionally, we update the watermark (with either high/low value). We only
+ * need to do this for the main page range, not for the intersecting ranges.
+ *
+ * XXX Maybe we should initialize the tidscan only once, and then do rescan
+ * for the following ranges? And similarly for the tuplesort?
+ */
+static void
+brinsort_start_tidscan(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate = node->ss.ps.state;
+	BrinRange  *range = node->bs_range;
+
+	/* There must not be any TID scan in progress yet. */
+	Assert(node->ss.ss_currentScanDesc == NULL);
+
+	/* Initialize the TID range scan, for the provided block range. */
+	if (node->ss.ss_currentScanDesc == NULL)
+	{
+		TableScanDesc		tscandesc;
+		ItemPointerData		mintid,
+							maxtid;
+
+		ItemPointerSetBlockNumber(&mintid, range->blkno_start);
+		ItemPointerSetOffsetNumber(&mintid, 0);
+
+		ItemPointerSetBlockNumber(&maxtid, range->blkno_end);
+		ItemPointerSetOffsetNumber(&maxtid, MaxHeapTuplesPerPage);
+
+		elog(DEBUG1, "loading range blocks [%u, %u]",
+			 range->blkno_start, range->blkno_end);
+
+		tscandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 &mintid, &maxtid);
+		node->ss.ss_currentScanDesc = tscandesc;
+	}
+
+	if (node->bs_tuplesortstate == NULL)
+	{
+		TupleDesc	tupDesc = RelationGetDescr(node->ss.ss_currentRelation);
+
+		node->bs_tuplesortstate = tuplesort_begin_heap(tupDesc,
+													plan->numCols,
+													plan->sortColIdx,
+													plan->sortOperators,
+													plan->collations,
+													plan->nullsFirst,
+													work_mem,
+													NULL,
+													TUPLESORT_NONE);
+	}
+
+	if (node->bs_tuplestore == NULL)
+	{
+		node->bs_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+	}
+}
+
+/*
+ * brinsort_end_tidscan
+ *		Finish the TID range scan.
+ */
+static void
+brinsort_end_tidscan(BrinSortState *node)
+{
+	/* get the first range, read all tuples using a tid range scan */
+	if (node->ss.ss_currentScanDesc != NULL)
+	{
+		table_endscan(node->ss.ss_currentScanDesc);
+		node->ss.ss_currentScanDesc = NULL;
+	}
+}
+
+/*
+ * brinsort_update_watermark
+ *		Advance the watermark to the next minval (or maxval for DESC).
+ *
+ * We could could actually advance the watermark by multiple steps (not to
+ * the immediately following minval, but a couple more), to accumulate more
+ * rows in the tuplesort. The number of steps we make correlates with the
+ * amount of data we sort in a given step, but we don't know in advance
+ * how many rows (or bytes) will that actually be. We could do some simple
+ * heuristics (measure past sorts and extrapolate).
+ */
+static void
+brinsort_update_watermark(BrinSortState *node, bool asc)
+{
+	int		cmp;
+	bool	found = false;
+
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool	isnull;
+		Datum	value;
+		bool	all_nulls;
+		bool	not_summarized;
+
+		all_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 4, &isnull));
+		Assert(!isnull);
+
+		not_summarized = DatumGetBool(slot_getattr(node->bs_scan->slot, 5, &isnull));
+		Assert(!isnull);
+
+		/* we ignore ranges that are either all_nulls or not summarized */
+		if (all_nulls || not_summarized)
+			continue;
+
+		/* use either minval or maxval, depending on the ASC / DESC */
+		if (asc)
+			value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		else
+			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		if (!node->bs_watermark_set)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			continue;
+		}
+
+		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
+								  &node->bs_sortsupport);
+
+		if (cmp < 0)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			found = true;
+			break;
+		}
+	}
+
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	node->bs_watermark_set = found;
+}
+
+/*
+ * brinsort_load_tuples
+ *		Load tuples from the TID range scan, add them to tuplesort/store.
+ *
+ * When called for the "current" range, we don't need to check the watermark,
+ * we know the tuple goes into the tuplesort. So with check_watermark we
+ * skip the comparator call to save CPU cost.
+ */
+static void
+brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_processing)
+{
+	BrinSort	   *plan = (BrinSort *) node->ss.ps.plan;
+	TableScanDesc	scan;
+	EState		   *estate;
+	ScanDirection	direction;
+	TupleTableSlot *slot;
+	BrinRange	   *range = node->bs_range;
+
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+
+	slot = node->ss.ss_ScanTupleSlot;
+
+	Assert(node->bs_range != NULL);
+
+	/*
+	 * If we're not processign NULLS, and this is all-nulls range, we can
+	 * just skip it - we won't find any non-NULL tuples in it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (!null_processing && range->all_nulls)
+		return;
+
+	/*
+	 * Similarly, if we're processing NULLs and this range does not have
+	 * has_nulls flag, we can skip it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
+		return;
+
+	brinsort_start_tidscan(node);
+
+	scan = node->ss.ss_currentScanDesc;
+
+	/*
+	 * Read tuples, evaluate the filer (so that we don't keep tuples only to
+	 * discard them later), and decide if it goes into the current range
+	 * (tuplesort) or overflow (tuplestore).
+	 */
+	while (table_scan_getnextslot_tidrange(scan, direction, slot))
+	{
+		ExprContext *econtext;
+		ExprState  *qual;
+
+		/*
+		 * Fetch data from node
+		 */
+		qual = node->bs_qual;
+		econtext = node->ss.ps.ps_ExprContext;
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 *
+		 * XXX Done here, because in ExecScan we'll get different slot type
+		 * (minimal tuple vs. buffered tuple). Scan expects slot while reading
+		 * from the table (like here), but we're stashing it into a tuplesort.
+		 *
+		 * XXX Maybe we could eliminate many tuples by leveraging the BRIN
+		 * range, by executing the consistent function. But we don't have
+		 * the qual in appropriate format at the moment, so we'd preprocess
+		 * the keys similarly to bringetbitmap(). In which case we should
+		 * probably evaluate the stuff while building the ranges? Although,
+		 * if the "consistent" function is expensive, it might be cheaper
+		 * to do that incrementally, as we need the ranges. Would be a win
+		 * for LIMIT queries, for example.
+		 *
+		 * XXX However, maybe we could also leverage other bitmap indexes,
+		 * particularly for BRIN indexes because that makes it simpler to
+		 * eliminage the ranges incrementally - we know which ranges to
+		 * load from the index, while for other indexes (e.g. btree) we
+		 * have to read the whole index and build a bitmap in order to have
+		 * a bitmap for any range. Although, if the condition is very
+		 * selective, we may need to read only a small fraction of the
+		 * index, so maybe that's OK.
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			int		cmp = 0;	/* matters for check_watermark=false */
+			Datum	value;
+			bool	isnull;
+
+			value = slot_getattr(slot, plan->sortColIdx[0], &isnull);
+
+			/*
+			 * FIXME Not handling NULLS for now, we need to stash them into
+			 * a separate tuplestore (so that we can output them first or
+			 * last), and then skip them in the regular processing?
+			 */
+			if (null_processing)
+			{
+				/* Stash it to the tuplestore (when NULL, or ignore
+				 * it (when not-NULL). */
+				if (isnull)
+					tuplestore_puttupleslot(node->bs_tuplestore, slot);
+
+				/* NULL or not, we're done */
+				continue;
+			}
+
+			/* we're not processing NULL values, so ignore NULLs */
+			if (isnull)
+				continue;
+
+			/*
+			 * Otherwise compare to watermark, and stash it either to the
+			 * tuplesort or tuplestore.
+			 */
+			if (check_watermark && node->bs_watermark_set)
+				cmp = ApplySortComparator(value, false,
+										  node->bs_watermark, false,
+										  &node->bs_sortsupport);
+
+			if (cmp <= 0)
+				tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+			else
+				tuplestore_puttupleslot(node->bs_tuplestore, slot);
+		}
+
+		ExecClearTuple(slot);
+	}
+
+	ExecClearTuple(slot);
+
+	brinsort_end_tidscan(node);
+}
+
+/*
+ * brinsort_load_spill_tuples
+ *		Load tuples from the spill tuplestore, and either stash them into
+ *		a tuplesort or a new tuplestore.
+ *
+ * After processing the last range, we want to process all remaining ranges,
+ * so with check_watermark=false we skip the check.
+ */
+static void
+brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	Tuplestorestate *tupstore;
+	TupleTableSlot *slot;
+
+	if (node->bs_tuplestore == NULL)
+		return;
+
+	/* start scanning the existing tuplestore (XXX needed?) */
+	tuplestore_rescan(node->bs_tuplestore);
+
+	/*
+	 * Create a new tuplestore, for tuples that exceed the watermark and so
+	 * should not be included in the current sort.
+	 */
+	tupstore = tuplestore_begin_heap(false, false, work_mem);
+
+	/*
+	 * We need a slot for minimal tuples. The scan slot uses buffered tuples,
+	 * so it'd trigger an error in the loop.
+	 */
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(node->ss.ss_currentRelation),
+									&TTSOpsMinimalTuple);
+
+	while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+	{
+		int		cmp = 0;	/* matters for check_watermark=false */
+		bool	isnull;
+		Datum	value;
+
+		value = slot_getattr(slot, plan->sortColIdx[0], &isnull);
+
+		/* We shouldn't have NULL values in the spill, at least not now. */
+		Assert(!isnull);
+
+		if (check_watermark && node->bs_watermark_set)
+			cmp = ApplySortComparator(value, false,
+									  node->bs_watermark, false,
+									  &node->bs_sortsupport);
+
+		if (cmp <= 0)
+			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+		else
+			tuplestore_puttupleslot(tupstore, slot);
+	}
+
+	/*
+	 * Discard the existing tuplestore (that we just processed), use the new
+	 * one instead.
+	 */
+	tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = tupstore;
+
+	ExecDropSingleTupleTableSlot(slot);
+}
+
+static bool
+brinsort_next_range(BrinSortState *node, bool asc)
+{
+	/* FIXME free the current bs_range, if any */
+	node->bs_range = NULL;
+
+	/*
+	 * Mark the position, so that we can restore it in case we reach the
+	 * current watermark.
+	 */
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	/*
+	 * Get the next range and return it, unless we can prove it's the last
+	 * range that can possibly match the current conditon (thanks to how we
+	 * order the ranges).
+	 *
+	 * Also skip ranges that can't possibly match (e.g. because we are in
+	 * NULL processing, and the range has no NULLs).
+	 */
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool		isnull;
+		Datum		value;
+
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = slot_getattr(node->bs_scan->slot, 1, &isnull);
+		range->blkno_end = slot_getattr(node->bs_scan->slot, 2, &isnull);
+		range->has_nulls = slot_getattr(node->bs_scan->slot, 3, &isnull);
+		range->all_nulls = slot_getattr(node->bs_scan->slot, 4, &isnull);
+		range->not_summarized = slot_getattr(node->bs_scan->slot, 5, &isnull);
+		range->min_value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		/*
+		 * Not-summarized ranges match irrespectedly of the watermark (if
+		 * it's set at all).
+		 */
+		if (range->not_summarized)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * The range is summarized, but maybe the watermark is not? That
+		 * would mean we're processing NULL values, so we skip ranges that
+		 * can't possibly match (i.e. with all_nulls=has_nulls=false).
+		 */
+		if (!node->bs_watermark_set)
+		{
+			if (range->all_nulls || range->has_nulls)
+			{
+				node->bs_range = range;
+				return true;
+			}
+
+			/* update the position and try the next range */
+			tuplesort_markpos(node->bs_scan->ranges);
+			pfree(range);
+
+			continue;
+		}
+
+		/*
+		 * So now we have a summarized range, and we know the watermark
+		 * is set too (so we're not processing NULLs). We place the ranges
+		 * with only nulls last, so once we hit one we're done.
+		 */
+		if (range->all_nulls)
+		{
+			pfree(range);
+			return false;	/* no more matching ranges */
+		}
+
+		/*
+		 * Compare the range to the watermark, using either the minval or
+		 * maxval, depending on ASC/DESC ordering. If the range precedes the
+		 * watermark, return it. Otherwise abort, all the future ranges are
+		 * either not matching the watermark (thanks to ordering) or contain
+		 * only NULL values.
+		 */
+
+		/* use minval or maxval, depending on ASC / DESC */
+		value = (asc) ? range->min_value : range->max_value;
+
+		/*
+		 * compare it to the current watermark (if set)
+		 *
+		 * XXX We don't use (... <= 0) here, because then we'd load ranges
+		 * with that minval (and there might be multiple), but most of the
+		 * rows would go into the tuplestore, because only rows matching the
+		 * minval exactly would be loaded into tuplesort.
+		 */
+		if (ApplySortComparator(value, false,
+								 node->bs_watermark, false,
+								 &node->bs_sortsupport) < 0)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		pfree(range);
+		break;
+	}
+
+	/* not a matching range, we're done */
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	return false;
+}
+
+static bool
+brinsort_range_with_nulls(BrinSortState *node)
+{
+	BrinRange *range = node->bs_range;
+
+	if (range->all_nulls || range->has_nulls || range->not_summarized)
+		return true;
+
+	return false;
+}
+
+static void
+brinsort_rescan(BrinSortState *node)
+{
+	tuplesort_rescan(node->bs_scan->ranges);
+}
+
+/* ----------------------------------------------------------------
+ *		IndexNext
+ *
+ *		Retrieve a tuple from the BrinSort node's currentRelation
+ *		using the index specified in the BrinSortState information.
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+IndexNext(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate;
+	ScanDirection direction;
+	IndexScanDesc scandesc;
+	TupleTableSlot *slot;
+	bool		nullsFirst;
+	bool		asc;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+
+	/* flip direction if this is an overall backward scan */
+	/* XXX For BRIN indexes this is always forward direction */
+	// if (ScanDirectionIsBackward(((BrinSort *) node->ss.ps.plan)->indexorderdir))
+	if (false)
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	scandesc = node->iss_ScanDesc;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	nullsFirst = plan->nullsFirst[0];
+	asc = ScanDirectionIsForward(plan->indexorderdir);
+
+	if (scandesc == NULL)
+	{
+		/*
+		 * We reach here if the index scan is not parallel, or if we're
+		 * serially executing an index scan that was planned to be parallel.
+		 */
+		scandesc = index_beginscan(node->ss.ss_currentRelation,
+								   node->iss_RelationDesc,
+								   estate->es_snapshot,
+								   node->iss_NumScanKeys,
+								   node->iss_NumOrderByKeys);
+
+		node->iss_ScanDesc = scandesc;
+
+		/*
+		 * If no run-time keys to calculate or they are ready, go ahead and
+		 * pass the scankeys to the index AM.
+		 */
+		if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+			index_rescan(scandesc,
+						 node->iss_ScanKeys, node->iss_NumScanKeys,
+						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+
+		/*
+		 * Load info about BRIN ranges, sort them to match the desired ordering.
+		 */
+		ExecInitBrinSortRanges(plan, node);
+		node->bs_phase = BRINSORT_START;
+	}
+
+	/*
+	 * ok, now that we have what we need, fetch the next tuple.
+	 */
+	while (node->bs_phase != BRINSORT_FINISHED)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		elog(DEBUG1, "phase = %d", node->bs_phase);
+
+		AssertCheckRanges(node);
+
+		switch (node->bs_phase)
+		{
+			case BRINSORT_START:
+
+				elog(DEBUG1, "phase = START");
+
+				/*
+				 * If we have NULLS FIRST, move to that stage. Otherwise
+				 * start scanning regular ranges.
+				 */
+				if (nullsFirst)
+					node->bs_phase = BRINSORT_LOAD_NULLS;
+				else
+				{
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+
+					/* set the first watermark */
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_RANGE:
+				{
+					elog(DEBUG1, "phase = LOAD_RANGE");
+
+					/*
+					 * Load tuples matching the new watermark from the existing
+					 * spill tuplestore. We do this before loading tuples from
+					 * the next chunk of ranges, because those will add tuples
+					 * to the spill, and we'd end up processing those twice.
+					 */
+					brinsort_load_spill_tuples(node, true);
+
+					/*
+					 * Load tuples from ranges, until we find a range that has
+					 * min_value >= watermark.
+					 *
+					 * XXX In fact, we are guaranteed to find an exact match
+					 * for the watermark, because of how we pick the watermark.
+					 */
+					while (brinsort_next_range(node, asc))
+						brinsort_load_tuples(node, true, false);
+
+					/*
+					 * If we have loaded any tuples into the tuplesort, try
+					 * sorting it and move to producing the tuples.
+					 *
+					 * XXX The range might have no rows matching the current
+					 * watermark, in which case the tuplesort is empty.
+					 */
+					if (node->bs_tuplesortstate)
+					{
+						tuplesort_performsort(node->bs_tuplesortstate);
+#ifdef BRINSORT_DEBUG
+						{
+							TuplesortInstrumentation stats;
+
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							elog(DEBUG1, "method: %s  space: %ld kB (%s)",
+								 tuplesort_method_name(stats.sortMethod),
+								 stats.spaceUsed,
+								 tuplesort_space_type_name(stats.spaceType));
+						}
+#endif
+					}
+
+					node->bs_phase = BRINSORT_PROCESS_RANGE;
+					break;
+				}
+
+			case BRINSORT_PROCESS_RANGE:
+
+				elog(DEBUG1, "phase BRINSORT_PROCESS_RANGE");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplesortstate != NULL)
+				{
+					if (tuplesort_gettupleslot(node->bs_tuplesortstate,
+										ScanDirectionIsForward(direction),
+										false, slot, NULL))
+						return slot;
+
+					/* once we're done with the tuplesort, reset it */
+					tuplesort_reset(node->bs_tuplesortstate);
+				}
+
+				/*
+				 * Now that we processed tuples from the last range batch,
+				 * see if we reached the end of if we should try updating
+				 * the watermark once again. If the watermark is not set,
+				 * we've already processed the last range.
+				 */
+				if (!node->bs_watermark_set)
+				{
+					if (nullsFirst)
+						node->bs_phase = BRINSORT_FINISHED;
+					else
+					{
+						brinsort_rescan(node);
+						node->bs_phase = BRINSORT_LOAD_NULLS;
+					}
+				}
+				else
+				{
+					/* updte the watermark and try reading more ranges */
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_NULLS:
+				{
+					elog(DEBUG1, "phase = LOAD_NULLS");
+
+					/*
+					 * Try loading another range. If there are no more ranges,
+					 * we're done and we move either to loading regular ranges.
+					 * Otherwise check if this range can contain 
+					 */
+					while (true)
+					{
+						/* no more ranges - terminate or load regular ranges */
+						if (!brinsort_next_range(node, asc))
+						{
+							if (nullsFirst)
+							{
+								brinsort_rescan(node);
+								node->bs_phase = BRINSORT_LOAD_RANGE;
+								brinsort_update_watermark(node, asc);
+							}
+							else
+								node->bs_phase = BRINSORT_FINISHED;
+
+							break;
+						}
+
+						/* If this range (may) have nulls, proces them */
+						if (brinsort_range_with_nulls(node))
+							break;
+					}
+
+					if (node->bs_range == NULL)
+						break;
+
+					/*
+					 * There should be nothing left in the tuplestore, because
+					 * we flush that at the end of processing regular tuples,
+					 * and we don't retain tuples between NULL ranges.
+					 */
+					// Assert(node->bs_tuplestore == NULL);
+
+					/*
+					 * Load the next unprocessed / NULL range. We don't need to
+					 * check watermark while processing NULLS.
+					 */
+					brinsort_load_tuples(node, false, true);
+
+					node->bs_phase = BRINSORT_PROCESS_NULLS;
+					break;
+				}
+
+				break;
+
+			case BRINSORT_PROCESS_NULLS:
+
+				elog(DEBUG1, "phase = LOAD_NULLS");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				Assert(node->bs_tuplestore != NULL);
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplestore != NULL)
+				{
+
+					while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+						return slot;
+
+					tuplestore_end(node->bs_tuplestore);
+					node->bs_tuplestore = NULL;
+
+					node->bs_phase = BRINSORT_LOAD_NULLS;	/* load next range */
+				}
+
+				break;
+
+			case BRINSORT_FINISHED:
+				elog(ERROR, "unexpected BrinSort phase: FINISHED");
+				break;
+		}
+	}
+
+	/*
+	 * if we get here it means the index scan failed so we are at the end of
+	 * the scan..
+	 */
+	node->iss_ReachedEnd = true;
+	return ExecClearTuple(slot);
+}
+
+/*
+ * IndexRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+IndexRecheck(BrinSortState *node, TupleTableSlot *slot)
+{
+	ExprContext *econtext;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	econtext = node->ss.ps.ps_ExprContext;
+
+	/* Does the tuple meet the indexqual condition? */
+	econtext->ecxt_scantuple = slot;
+	return ExecQualAndReset(node->indexqualorig, econtext);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSort(node)
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecBrinSort(PlanState *pstate)
+{
+	BrinSortState *node = castNode(BrinSortState, pstate);
+
+	/*
+	 * If we have runtime keys and they've not already been set up, do it now.
+	 */
+	if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady)
+		ExecReScan((PlanState *) node);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) IndexNext,
+					(ExecScanRecheckMtd) IndexRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanBrinSort(node)
+ *
+ *		Recalculates the values of any scan keys whose value depends on
+ *		information known at runtime, then rescans the indexed relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanBrinSort(BrinSortState *node)
+{
+	/*
+	 * If we are doing runtime key calculations (ie, any of the index key
+	 * values weren't simple Consts), compute the new key values.  But first,
+	 * reset the context so we don't leak memory as each outer tuple is
+	 * scanned.  Note this assumes that we will recalculate *all* runtime keys
+	 * on each call.
+	 */
+	if (node->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *econtext = node->iss_RuntimeContext;
+
+		ResetExprContext(econtext);
+		ExecIndexEvalRuntimeKeys(econtext,
+								 node->iss_RuntimeKeys,
+								 node->iss_NumRuntimeKeys);
+	}
+	node->iss_RuntimeKeysReady = true;
+
+	/* reset index scan */
+	if (node->iss_ScanDesc)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+	node->iss_ReachedEnd = false;
+
+	ExecScanReScan(&node->ss);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecEndBrinSort
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndBrinSort(BrinSortState *node)
+{
+	Relation	indexRelationDesc;
+	IndexScanDesc IndexScanDesc;
+
+	/*
+	 * extract information from the node
+	 */
+	indexRelationDesc = node->iss_RelationDesc;
+	IndexScanDesc = node->iss_ScanDesc;
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/*
+	 * close the index relation (no-op if we didn't open it)
+	 */
+	if (IndexScanDesc)
+		index_endscan(IndexScanDesc);
+	if (indexRelationDesc)
+		index_close(indexRelationDesc, NoLock);
+
+	if (node->ss.ss_currentScanDesc != NULL)
+		table_endscan(node->ss.ss_currentScanDesc);
+
+	if (node->bs_tuplestore != NULL)
+		tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = NULL;
+
+	if (node->bs_tuplesortstate != NULL)
+		tuplesort_end(node->bs_tuplesortstate);
+	node->bs_tuplesortstate = NULL;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortMarkPos
+ *
+ * Note: we assume that no caller attempts to set a mark before having read
+ * at least one tuple.  Otherwise, iss_ScanDesc might still be NULL.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortMarkPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  If a test tuple exists for
+		 * this relation, then we shouldn't access the index at all.  We would
+		 * instead need to save, and later restore, the state of the
+		 * relsubs_done flag, so that re-fetching the test tuple is possible.
+		 * However, given the assumption that no caller sets a mark at the
+		 * start of the scan, we can only get here with relsubs_done[i]
+		 * already set, and so no state need be saved.
+		 */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortMarkPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_markpos(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecIndexRestrPos
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortRestrPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (estate->es_epq_active != NULL)
+	{
+		/* See comments in ExecIndexMarkPos */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortRestrPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_restrpos(node->iss_ScanDesc);
+}
+
+/*
+ * somewhat crippled verson of bringetbitmap
+ *
+ * XXX We don't call consistent function (or any other function), so unlike
+ * bringetbitmap we don't set a separate memory context. If we end up filtering
+ * the ranges somehow (e.g. by WHERE conditions), this might be necessary.
+ *
+ * XXX Should be part of opclass, to somewhere in brin_minmax.c etc.
+ */
+static void
+ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
+{
+	IndexScanDesc	scan = planstate->iss_ScanDesc;
+	Relation	indexRel = planstate->iss_RelationDesc;
+	int			attno;
+	FmgrInfo   *rangeproc;
+	BrinRangeScanDesc *brscan;
+	bool		asc;
+
+	/* BRIN Sort only allows ORDER BY using a single column */
+	Assert(node->numCols == 1);
+
+	/*
+	 * Determine index attnum we're interested in. The sortColIdx has attnums
+	 * from the table, but we need index attnum so that we can fetch the right
+	 * range summary.
+	 *
+	 * XXX Maybe we could/should arrange the tlists differently, so that this
+	 * is not necessary?
+	 *
+	 * FIXME This is broken, node->sortColIdx[0] is an index into the target
+	 * list, not table attnum.
+	 *
+	 * FIXME Also the projection is broken.
+	 */
+	attno = 0;
+	for (int i = 0; i < indexRel->rd_index->indnatts; i++)
+	{
+		if (indexRel->rd_index->indkey.values[i] == node->sortColIdx[0])
+		{
+			attno = (i + 1);
+			break;
+		}
+	}
+
+	/* make sure we matched the argument */
+	Assert(attno > 0);
+
+	/* get procedure to generate sort ranges */
+	rangeproc = index_getprocinfo(indexRel, attno, BRIN_PROCNUM_RANGES);
+
+	/*
+	 * Should not get here without a proc, thanks to the check before
+	 * building the BrinSort path.
+	 */
+	Assert(rangeproc != NULL);
+
+	memset(&planstate->bs_sortsupport, 0, sizeof(SortSupportData));
+	PrepareSortSupportFromOrderingOp(node->sortOperators[0], &planstate->bs_sortsupport);
+
+	/*
+	 * Determine if this ASC or DESC sort, so that we can request the
+	 * ranges in the appropriate order (ordered either by minval for
+	 * ASC, or by maxval for DESC).
+	 */
+	asc = ScanDirectionIsForward(node->indexorderdir);
+
+	/*
+	 * Ask the opclass to produce ranges in appropriate ordering.
+	 *
+	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
+	 */
+	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
+											InvalidOid,	/* FIXME use proper collation*/
+											PointerGetDatum(scan),
+											Int16GetDatum(attno),
+											BoolGetDatum(asc)));
+
+	/* allocate for space, and also for the alternative ordering */
+	planstate->bs_scan = brscan;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitBrinSort
+ *
+ *		Initializes the index scan's state information, creates
+ *		scan keys, and opens the base and index relations.
+ *
+ *		Note: index scans have 2 sets of state information because
+ *			  we have to keep track of the base relation and the
+ *			  index relation.
+ * ----------------------------------------------------------------
+ */
+BrinSortState *
+ExecInitBrinSort(BrinSort *node, EState *estate, int eflags)
+{
+	BrinSortState *indexstate;
+	Relation	currentRelation;
+	LOCKMODE	lockmode;
+
+	/*
+	 * create state structure
+	 */
+	indexstate = makeNode(BrinSortState);
+	indexstate->ss.ps.plan = (Plan *) node;
+	indexstate->ss.ps.state = estate;
+	indexstate->ss.ps.ExecProcNode = ExecBrinSort;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &indexstate->ss.ps);
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	indexstate->ss.ss_currentRelation = currentRelation;
+	indexstate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &indexstate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTypeTL(&indexstate->ss.ps);
+	ExecAssignScanProjectionInfo(&indexstate->ss);
+
+	/*
+	 * initialize child expressions
+	 *
+	 * Note: we don't initialize all of the indexqual expression, only the
+	 * sub-parts corresponding to runtime keys (see below).  Likewise for
+	 * indexorderby, if any.  But the indexqualorig expression is always
+	 * initialized even though it will only be used in some uncommon cases ---
+	 * would be nice to improve that.  (Problem is that any SubPlans present
+	 * in the expression must be found now...)
+	 */
+	indexstate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) indexstate);
+	indexstate->indexqualorig =
+		ExecInitQual(node->indexqualorig, (PlanState *) indexstate);
+
+	/*
+	 * If we are just doing EXPLAIN (ie, aren't going to run the plan), stop
+	 * here.  This allows an index-advisor plugin to EXPLAIN a plan containing
+	 * references to nonexistent indexes.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return indexstate;
+
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+
+	/*
+	 * Initialize index-specific scan state
+	 */
+	indexstate->iss_RuntimeKeysReady = false;
+	indexstate->iss_RuntimeKeys = NULL;
+	indexstate->iss_NumRuntimeKeys = 0;
+
+	/*
+	 * build the index scan keys from the index qualification
+	 */
+	ExecIndexBuildScanKeys((PlanState *) indexstate,
+						   indexstate->iss_RelationDesc,
+						   node->indexqual,
+						   false,
+						   &indexstate->iss_ScanKeys,
+						   &indexstate->iss_NumScanKeys,
+						   &indexstate->iss_RuntimeKeys,
+						   &indexstate->iss_NumRuntimeKeys,
+						   NULL,	/* no ArrayKeys */
+						   NULL);
+
+	/*
+	 * If we have runtime keys, we need an ExprContext to evaluate them. The
+	 * node's standard context won't do because we want to reset that context
+	 * for every tuple.  So, build another context just like the other one...
+	 * -tgl 7/11/00
+	 */
+	if (indexstate->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *stdecontext = indexstate->ss.ps.ps_ExprContext;
+
+		ExecAssignExprContext(estate, &indexstate->ss.ps);
+		indexstate->iss_RuntimeContext = indexstate->ss.ps.ps_ExprContext;
+		indexstate->ss.ps.ps_ExprContext = stdecontext;
+	}
+	else
+	{
+		indexstate->iss_RuntimeContext = NULL;
+	}
+
+	indexstate->bs_tuplesortstate = NULL;
+	indexstate->bs_qual = indexstate->ss.ps.qual;
+	indexstate->ss.ps.qual = NULL;
+	ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+
+	/*
+	 * all done.
+	 */
+	return indexstate;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortEstimate(BrinSortState *node,
+					  ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+
+	node->iss_PscanLen = index_parallelscan_estimate(node->iss_RelationDesc,
+													 estate->es_snapshot);
+	shm_toc_estimate_chunk(&pcxt->estimator, node->iss_PscanLen);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeDSM
+ *
+ *		Set up a parallel index scan descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeDSM(BrinSortState *node,
+						   ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_allocate(pcxt->toc, node->iss_PscanLen);
+	index_parallelscan_initialize(node->ss.ss_currentRelation,
+								  node->iss_RelationDesc,
+								  estate->es_snapshot,
+								  piscan);
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, piscan);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortReInitializeDSM(BrinSortState *node,
+							 ParallelContext *pcxt)
+{
+	index_parallelrescan(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeWorker(BrinSortState *node,
+							  ParallelWorkerContext *pwcxt)
+{
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 4c6b1d1f55b..64d103b19e9 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -790,6 +790,260 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	path->path.total_cost = startup_cost + run_cost;
 }
 
+void
+cost_brinsort(BrinSortPath *path, PlannerInfo *root, double loop_count,
+		   bool partial_path)
+{
+	IndexOptInfo *index = path->ipath.indexinfo;
+	RelOptInfo *baserel = index->rel;
+	amcostestimate_function amcostestimate;
+	List	   *qpquals;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_run_cost = 0;
+	Cost		indexStartupCost;
+	Cost		indexTotalCost;
+	Selectivity indexSelectivity;
+	double		indexCorrelation,
+				csquared;
+	double		spc_seq_page_cost,
+				spc_random_page_cost;
+	Cost		min_IO_cost,
+				max_IO_cost;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	double		tuples_fetched;
+	double		pages_fetched;
+	double		rand_heap_pages;
+	double		index_pages;
+
+	/* Should only be applied to base relations */
+	Assert(IsA(baserel, RelOptInfo) &&
+		   IsA(index, IndexOptInfo));
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/*
+	 * Mark the path with the correct row estimate, and identify which quals
+	 * will need to be enforced as qpquals.  We need not check any quals that
+	 * are implied by the index's predicate, so we can use indrestrictinfo not
+	 * baserestrictinfo as the list of relevant restriction clauses for the
+	 * rel.
+	 */
+	if (path->ipath.path.param_info)
+	{
+		path->ipath.path.rows = path->ipath.path.param_info->ppi_rows;
+		/* qpquals come from the rel's restriction clauses and ppi_clauses */
+		qpquals = list_concat(extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+														  path->ipath.indexclauses),
+							  extract_nonindex_conditions(path->ipath.path.param_info->ppi_clauses,
+														  path->ipath.indexclauses));
+	}
+	else
+	{
+		path->ipath.path.rows = baserel->rows;
+		/* qpquals come from just the rel's restriction clauses */
+		qpquals = extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+											  path->ipath.indexclauses);
+	}
+
+	if (!enable_indexscan)
+		startup_cost += disable_cost;
+	/* we don't need to check enable_indexonlyscan; indxpath.c does that */
+
+	/*
+	 * Call index-access-method-specific code to estimate the processing cost
+	 * for scanning the index, as well as the selectivity of the index (ie,
+	 * the fraction of main-table tuples we will have to retrieve) and its
+	 * correlation to the main-table tuple order.  We need a cast here because
+	 * pathnodes.h uses a weak function type to avoid including amapi.h.
+	 */
+	amcostestimate = (amcostestimate_function) index->amcostestimate;
+	amcostestimate(root, &path->ipath, loop_count,
+				   &indexStartupCost, &indexTotalCost,
+				   &indexSelectivity, &indexCorrelation,
+				   &index_pages);
+
+	/*
+	 * Save amcostestimate's results for possible use in bitmap scan planning.
+	 * We don't bother to save indexStartupCost or indexCorrelation, because a
+	 * bitmap scan doesn't care about either.
+	 */
+	path->ipath.indextotalcost = indexTotalCost;
+	path->ipath.indexselectivity = indexSelectivity;
+
+	/* all costs for touching index itself included here */
+	startup_cost += indexStartupCost;
+	run_cost += indexTotalCost - indexStartupCost;
+
+	/* estimate number of main-table tuples fetched */
+	tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);
+
+	/* fetch estimated page costs for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/*----------
+	 * Estimate number of main-table pages fetched, and compute I/O cost.
+	 *
+	 * When the index ordering is uncorrelated with the table ordering,
+	 * we use an approximation proposed by Mackert and Lohman (see
+	 * index_pages_fetched() for details) to compute the number of pages
+	 * fetched, and then charge spc_random_page_cost per page fetched.
+	 *
+	 * When the index ordering is exactly correlated with the table ordering
+	 * (just after a CLUSTER, for example), the number of pages fetched should
+	 * be exactly selectivity * table_size.  What's more, all but the first
+	 * will be sequential fetches, not the random fetches that occur in the
+	 * uncorrelated case.  So if the number of pages is more than 1, we
+	 * ought to charge
+	 *		spc_random_page_cost + (pages_fetched - 1) * spc_seq_page_cost
+	 * For partially-correlated indexes, we ought to charge somewhere between
+	 * these two estimates.  We currently interpolate linearly between the
+	 * estimates based on the correlation squared (XXX is that appropriate?).
+	 *
+	 * If it's an index-only scan, then we will not need to fetch any heap
+	 * pages for which the visibility map shows all tuples are visible.
+	 * Hence, reduce the estimated number of heap fetches accordingly.
+	 * We use the measured fraction of the entire heap that is all-visible,
+	 * which might not be particularly relevant to the subset of the heap
+	 * that this query will fetch; but it's not clear how to do better.
+	 *----------
+	 */
+	if (loop_count > 1)
+	{
+		/*
+		 * For repeated indexscans, the appropriate estimate for the
+		 * uncorrelated case is to scale up the number of tuples fetched in
+		 * the Mackert and Lohman formula by the number of scans, so that we
+		 * estimate the number of pages fetched by all the scans; then
+		 * pro-rate the costs for one scan.  In this case we assume all the
+		 * fetches are random accesses.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+
+		/*
+		 * In the perfectly correlated case, the number of pages touched by
+		 * each scan is selectivity * table_size, and we can use the Mackert
+		 * and Lohman formula at the page level to estimate how much work is
+		 * saved by caching across scans.  We still assume all the fetches are
+		 * random, though, which is an overestimate that's hard to correct for
+		 * without double-counting the cache effects.  (But in most cases
+		 * where such a plan is actually interesting, only one page would get
+		 * fetched per scan anyway, so it shouldn't matter much.)
+		 */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		pages_fetched = index_pages_fetched(pages_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		min_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+	}
+	else
+	{
+		/*
+		 * Normal case: apply the Mackert and Lohman formula, and then
+		 * interpolate between that and the correlation-derived result.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		/* max_IO_cost is for the perfectly uncorrelated case (csquared=0) */
+		max_IO_cost = pages_fetched * spc_random_page_cost;
+
+		/* min_IO_cost is for the perfectly correlated case (csquared=1) */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		if (pages_fetched > 0)
+		{
+			min_IO_cost = spc_random_page_cost;
+			if (pages_fetched > 1)
+				min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
+		}
+		else
+			min_IO_cost = 0;
+	}
+
+	if (partial_path)
+	{
+		/*
+		 * Estimate the number of parallel workers required to scan index. Use
+		 * the number of heap pages computed considering heap fetches won't be
+		 * sequential as for parallel scans the pages are accessed in random
+		 * order.
+		 */
+		path->ipath.path.parallel_workers = compute_parallel_worker(baserel,
+															  rand_heap_pages,
+															  index_pages,
+															  max_parallel_workers_per_gather);
+
+		/*
+		 * Fall out if workers can't be assigned for parallel scan, because in
+		 * such a case this path will be rejected.  So there is no benefit in
+		 * doing extra computation.
+		 */
+		if (path->ipath.path.parallel_workers <= 0)
+			return;
+
+		path->ipath.path.parallel_aware = true;
+	}
+
+	/*
+	 * Now interpolate based on estimated index order correlation to get total
+	 * disk I/O cost for main table accesses.
+	 */
+	csquared = indexCorrelation * indexCorrelation;
+
+	run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
+
+	/*
+	 * Estimate CPU costs per tuple.
+	 *
+	 * What we want here is cpu_tuple_cost plus the evaluation costs of any
+	 * qual clauses that we have to evaluate as qpquals.
+	 */
+	cost_qual_eval(&qpqual_cost, qpquals, root);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+
+	cpu_run_cost += cpu_per_tuple * tuples_fetched;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->ipath.path.pathtarget->cost.startup;
+	cpu_run_cost += path->ipath.path.pathtarget->cost.per_tuple * path->ipath.path.rows;
+
+	/* Adjust costing for parallelism, if used. */
+	if (path->ipath.path.parallel_workers > 0)
+	{
+		double		parallel_divisor = get_parallel_divisor(&path->ipath.path);
+
+		path->ipath.path.rows = clamp_row_est(path->ipath.path.rows / parallel_divisor);
+
+		/* The CPU cost is divided among all the workers. */
+		cpu_run_cost /= parallel_divisor;
+	}
+
+	run_cost += cpu_run_cost;
+
+	path->ipath.path.startup_cost = startup_cost;
+	path->ipath.path.total_cost = startup_cost + run_cost;
+}
+
 /*
  * extract_nonindex_conditions
  *
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index c31fcc917df..18b625460eb 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -17,12 +17,16 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
+#include "access/relation.h"
 #include "access/stratnum.h"
 #include "access/sysattr.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
@@ -32,10 +36,13 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/rel.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
 
+bool		enable_brinsort = true;
+
 /* XXX see PartCollMatchesExprColl */
 #define IndexCollMatchesExprColl(idxcollation, exprcollation) \
 	((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
@@ -1127,6 +1134,185 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	/*
+	 * If this is a BRIN index with suitable opclass (minmax or such), we may
+	 * try doing BRIN sort. BRIN indexes are not ordered and amcanorderbyop
+	 * is set to false, so we probably will need some new opclass flag to
+	 * mark indexes that support this.
+	 */
+	if (enable_brinsort && pathkeys_possibly_useful)
+	{
+		ListCell *lc;
+		Relation rel2 = relation_open(index->indexoid, NoLock);
+		int		 idx;
+
+		/*
+		 * Try generating sorted paths for each key with the right opclass.
+		 */
+		idx = -1;
+		foreach(lc, index->indextlist)
+		{
+			TargetEntry	   *indextle = (TargetEntry *) lfirst(lc);
+			BrinSortPath   *bpath;
+			Oid				rangeproc;
+			AttrNumber		attnum;
+
+			idx++;
+			attnum = (idx + 1);
+
+			/* skip expressions for now */
+			if (!AttributeNumberIsValid(index->indexkeys[idx]))
+				continue;
+
+			/* XXX ignore non-BRIN indexes */
+			if (rel2->rd_rel->relam != BRIN_AM_OID)
+				continue;
+
+			/*
+			 * XXX Ignore keys not using an opclass with the "ranges" proc.
+			 * For now we only do this for some minmax opclasses, but adding
+			 * it to all minmax is simple, and adding it to minmax-multi
+			 * should not be very hard.
+			 */
+			rangeproc = index_getprocid(rel2, attnum, BRIN_PROCNUM_RANGES);
+			if (!OidIsValid(rangeproc))
+				continue;
+
+			/*
+			 * XXX stuff extracted from build_index_pathkeys, except that we
+			 * only deal with a single index key (producing a single pathkey),
+			 * so we only sort on a single column. I guess we could use more
+			 * index keys and sort on more expressions? Would that mean these
+			 * keys need to be rather well correlated? In any case, it seems
+			 * rather complex to implement, so I leave it as a possible
+			 * future improvement.
+			 *
+			 * XXX This could also use the other BRIN keys (even from other
+			 * indexes) in a different way - we might use the other ranges
+			 * to quickly eliminate some of the chunks, essentially like a
+			 * bitmap, but maybe without using the bitmap. Or we might use
+			 * other indexes through bitmaps.
+			 *
+			 * XXX This fakes a number of parameters, because we don't store
+			 * the btree opclass in the index, instead we use the default
+			 * one for the key data type. And BRIN does not allow specifying
+			 *
+			 * XXX We don't add the path to result, because this function is
+			 * supposed to generate IndexPaths. Instead, we just add the path
+			 * using add_path(). We should be building this in a different
+			 * place, perhaps in create_index_paths() or so.
+			 *
+			 * XXX By building it elsewhere, we could also leverage the index
+			 * paths we've built here, particularly the bitmap index paths,
+			 * which we could use to eliminate many of the ranges.
+			 *
+			 * XXX We don't have any explicit ordering associated with the
+			 * BRIN index, e.g. we don't have ASC/DESC and NULLS FIRST/LAST.
+			 * So this is not encoded in the index, and we can satisfy all
+			 * these cases - but we need to add paths for each combination.
+			 * I wonder if there's a better way to do this.
+			 */
+
+			/* ASC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* ASC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+		}
+
+		relation_close(rel2, NoLock);
+	}
+
 	return result;
 }
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index a9943cd6e01..83dde6f22eb 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -27,6 +27,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/lsyscache.h"
+#include "utils/typcache.h"
 
 
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
@@ -630,6 +631,55 @@ build_index_pathkeys(PlannerInfo *root,
 	return retval;
 }
 
+
+List *
+build_index_pathkeys_brin(PlannerInfo *root,
+						  IndexOptInfo *index,
+						  TargetEntry  *tle,
+						  int idx,
+						  bool reverse_sort,
+						  bool nulls_first)
+{
+	TypeCacheEntry *typcache;
+	PathKey		   *cpathkey;
+	Oid				sortopfamily;
+
+	/*
+	 * Get default btree opfamily for the type, extracted from the
+	 * entry in index targetlist.
+	 *
+	 * XXX Is there a better / more correct way to do this?
+	 */
+	typcache = lookup_type_cache(exprType((Node *) tle->expr),
+								 TYPECACHE_BTREE_OPFAMILY);
+	sortopfamily = typcache->btree_opf;
+
+	/*
+	 * OK, try to make a canonical pathkey for this sort key.  Note we're
+	 * underneath any outer joins, so nullable_relids should be NULL.
+	 */
+	cpathkey = make_pathkey_from_sortinfo(root,
+										  tle->expr,
+										  NULL,
+										  sortopfamily,
+										  index->opcintype[idx],
+										  index->indexcollations[idx],
+										  reverse_sort,
+										  nulls_first,
+										  0,
+										  index->rel->relids,
+										  false);
+
+	/*
+	 * There may be no pathkey if we haven't matched any sortkey, in which
+	 * case ignore it.
+	 */
+	if (!cpathkey)
+		return NIL;
+
+	return list_make1(cpathkey);
+}
+
 /*
  * partkey_is_bool_constant_for_query
  *
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ac86ce90033..395c632f430 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -124,6 +124,8 @@ static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
 										  List *tlist, List *scan_clauses);
 static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
 								   List *tlist, List *scan_clauses, bool indexonly);
+static BrinSort *create_brinsort_plan(PlannerInfo *root, BrinSortPath *best_path,
+									  List *tlist, List *scan_clauses);
 static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
 											   BitmapHeapPath *best_path,
 											   List *tlist, List *scan_clauses);
@@ -191,6 +193,9 @@ static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 List *indexorderby,
 										 List *indextlist,
 										 ScanDirection indexscandir);
+static BrinSort *make_brinsort(List *qptlist, List *qpqual, Index scanrelid,
+							   Oid indexid, List *indexqual, List *indexqualorig,
+							   ScanDirection indexscandir);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -410,6 +415,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_CustomScan:
 			plan = create_scan_plan(root, best_path, flags);
 			break;
+		case T_BrinSort:
+			plan = create_scan_plan(root, best_path, flags);
+			break;
 		case T_HashJoin:
 		case T_MergeJoin:
 		case T_NestLoop:
@@ -776,6 +784,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												   scan_clauses);
 			break;
 
+		case T_BrinSort:
+			plan = (Plan *) create_brinsort_plan(root,
+												 (BrinSortPath *) best_path,
+												 tlist,
+												 scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -3180,6 +3195,154 @@ create_indexscan_plan(PlannerInfo *root,
 	return scan_plan;
 }
 
+/*
+ * create_brinsort_plan
+ *	  Returns a brinsort plan for the base relation scanned by 'best_path'
+ *	  with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ *
+ * This is mostly a slighly simplified version of create_indexscan_plan, with
+ * the unecessary parts removed (we don't support indexonly scans, or reordering
+ * and similar stuff).
+ */
+static BrinSort *
+create_brinsort_plan(PlannerInfo *root,
+					 BrinSortPath *best_path,
+					 List *tlist,
+					 List *scan_clauses)
+{
+	BrinSort   *brinsort_plan;
+	List	   *indexclauses = best_path->ipath.indexclauses;
+	Index		baserelid = best_path->ipath.path.parent->relid;
+	IndexOptInfo *indexinfo = best_path->ipath.indexinfo;
+	Oid			indexoid = indexinfo->indexoid;
+	List	   *qpqual;
+	List	   *stripped_indexquals;
+	List	   *fixed_indexquals;
+	ListCell   *l;
+
+	List	   *pathkeys = best_path->ipath.path.pathkeys;
+
+	/* it should be a base rel... */
+	Assert(baserelid > 0);
+	Assert(best_path->ipath.path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * Extract the index qual expressions (stripped of RestrictInfos) from the
+	 * IndexClauses list, and prepare a copy with index Vars substituted for
+	 * table Vars.  (This step also does replace_nestloop_params on the
+	 * fixed_indexquals.)
+	 */
+	fix_indexqual_references(root, &best_path->ipath,
+							 &stripped_indexquals,
+							 &fixed_indexquals);
+
+	/*
+	 * The qpqual list must contain all restrictions not automatically handled
+	 * by the index, other than pseudoconstant clauses which will be handled
+	 * by a separate gating plan node.  All the predicates in the indexquals
+	 * will be checked (either by the index itself, or by nodeIndexscan.c),
+	 * but if there are any "special" operators involved then they must be
+	 * included in qpqual.  The upshot is that qpqual must contain
+	 * scan_clauses minus whatever appears in indexquals.
+	 *
+	 * is_redundant_with_indexclauses() detects cases where a scan clause is
+	 * present in the indexclauses list or is generated from the same
+	 * EquivalenceClass as some indexclause, and is therefore redundant with
+	 * it, though not equal.  (The latter happens when indxpath.c prefers a
+	 * different derived equality than what generate_join_implied_equalities
+	 * picked for a parameterized scan's ppi_clauses.)  Note that it will not
+	 * match to lossy index clauses, which is critical because we have to
+	 * include the original clause in qpqual in that case.
+	 *
+	 * In some situations (particularly with OR'd index conditions) we may
+	 * have scan_clauses that are not equal to, but are logically implied by,
+	 * the index quals; so we also try a predicate_implied_by() check to see
+	 * if we can discard quals that way.  (predicate_implied_by assumes its
+	 * first input contains only immutable functions, so we have to check
+	 * that.)
+	 *
+	 * Note: if you change this bit of code you should also look at
+	 * extract_nonindex_conditions() in costsize.c.
+	 */
+	qpqual = NIL;
+	foreach(l, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (rinfo->pseudoconstant)
+			continue;			/* we may drop pseudoconstants here */
+		if (is_redundant_with_indexclauses(rinfo, indexclauses))
+			continue;			/* dup or derived from same EquivalenceClass */
+		if (!contain_mutable_functions((Node *) rinfo->clause) &&
+			predicate_implied_by(list_make1(rinfo->clause), stripped_indexquals,
+								 false))
+			continue;			/* provably implied by indexquals */
+		qpqual = lappend(qpqual, rinfo);
+	}
+
+	/* Sort clauses into best execution order */
+	qpqual = order_qual_clauses(root, qpqual);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	qpqual = extract_actual_clauses(qpqual, false);
+
+	/*
+	 * We have to replace any outer-relation variables with nestloop params in
+	 * the indexqualorig, qpqual, and indexorderbyorig expressions.  A bit
+	 * annoying to have to do this separately from the processing in
+	 * fix_indexqual_references --- rethink this when generalizing the inner
+	 * indexscan support.  But note we can't really do this earlier because
+	 * it'd break the comparisons to predicates above ... (or would it?  Those
+	 * wouldn't have outer refs)
+	 */
+	if (best_path->ipath.path.param_info)
+	{
+		stripped_indexquals = (List *)
+			replace_nestloop_params(root, (Node *) stripped_indexquals);
+		qpqual = (List *)
+			replace_nestloop_params(root, (Node *) qpqual);
+	}
+
+	/* Finally ready to build the plan node */
+	brinsort_plan = make_brinsort(tlist,
+								  qpqual,
+								  baserelid,
+								  indexoid,
+								  fixed_indexquals,
+								  stripped_indexquals,
+								  best_path->ipath.indexscandir);
+
+	if (pathkeys != NIL)
+	{
+		/*
+		 * Compute sort column info, and adjust the Append's tlist as needed.
+		 * Because we pass adjust_tlist_in_place = true, we may ignore the
+		 * function result; it must be the same plan node.  However, we then
+		 * need to detect whether any tlist entries were added.
+		 */
+		(void) prepare_sort_from_pathkeys((Plan *) brinsort_plan, pathkeys,
+										  best_path->ipath.path.parent->relids,
+										  NULL,
+										  true,
+										  &brinsort_plan->numCols,
+										  &brinsort_plan->sortColIdx,
+										  &brinsort_plan->sortOperators,
+										  &brinsort_plan->collations,
+										  &brinsort_plan->nullsFirst);
+		//tlist_was_changed = (orig_tlist_length != list_length(plan->plan.targetlist));
+		for (int i = 0; i < brinsort_plan->numCols; i++)
+			elog(DEBUG1, "%d => %d %d %d %d", i,
+				 brinsort_plan->sortColIdx[i],
+				 brinsort_plan->sortOperators[i],
+				 brinsort_plan->collations[i],
+				 brinsort_plan->nullsFirst[i]);
+	}
+
+	copy_generic_path_info(&brinsort_plan->scan.plan, &best_path->ipath.path);
+
+	return brinsort_plan;
+}
+
 /*
  * create_bitmap_scan_plan
  *	  Returns a bitmap scan plan for the base relation scanned by 'best_path'
@@ -5523,6 +5686,31 @@ make_indexscan(List *qptlist,
 	return node;
 }
 
+static BrinSort *
+make_brinsort(List *qptlist,
+			   List *qpqual,
+			   Index scanrelid,
+			   Oid indexid,
+			   List *indexqual,
+			   List *indexqualorig,
+			   ScanDirection indexscandir)
+{
+	BrinSort  *node = makeNode(BrinSort);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->indexid = indexid;
+	node->indexqual = indexqual;
+	node->indexqualorig = indexqualorig;
+	node->indexorderdir = indexscandir;
+
+	return node;
+}
+
 static IndexOnlyScan *
 make_indexonlyscan(List *qptlist,
 				   List *qpqual,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1f..2584a1f032d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -609,6 +609,25 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 				return set_indexonlyscan_references(root, splan, rtoffset);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *splan = (BrinSort *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->indexqual =
+					fix_scan_list(root, splan->indexqual,
+								  rtoffset, 1);
+				splan->indexqualorig =
+					fix_scan_list(root, splan->indexqualorig,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *splan = (BitmapIndexScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 70f61ae7b1c..6471bbb5de8 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1030,6 +1030,63 @@ create_index_path(PlannerInfo *root,
 	return pathnode;
 }
 
+
+/*
+ * create_brinsort_path
+ *	  Creates a path node for sorted brin sort scan.
+ *
+ * 'index' is a usable index.
+ * 'indexclauses' is a list of IndexClause nodes representing clauses
+ *			to be enforced as qual conditions in the scan.
+ * 'indexorderbys' is a list of bare expressions (no RestrictInfos)
+ *			to be used as index ordering operators in the scan.
+ * 'indexorderbycols' is an integer list of index column numbers (zero based)
+ *			the ordering operators can be used with.
+ * 'pathkeys' describes the ordering of the path.
+ * 'indexscandir' is ForwardScanDirection or BackwardScanDirection
+ *			for an ordered index, or NoMovementScanDirection for
+ *			an unordered index.
+ * 'indexonly' is true if an index-only scan is wanted.
+ * 'required_outer' is the set of outer relids for a parameterized path.
+ * 'loop_count' is the number of repetitions of the indexscan to factor into
+ *		estimates of caching behavior.
+ * 'partial_path' is true if constructing a parallel index scan path.
+ *
+ * Returns the new path node.
+ */
+BrinSortPath *
+create_brinsort_path(PlannerInfo *root,
+					 IndexOptInfo *index,
+					 List *indexclauses,
+					 List *pathkeys,
+					 ScanDirection indexscandir,
+					 bool indexonly,
+					 Relids required_outer,
+					 double loop_count,
+					 bool partial_path)
+{
+	BrinSortPath  *pathnode = makeNode(BrinSortPath);
+	RelOptInfo *rel = index->rel;
+
+	pathnode->ipath.path.pathtype = T_BrinSort;
+	pathnode->ipath.path.parent = rel;
+	pathnode->ipath.path.pathtarget = rel->reltarget;
+	pathnode->ipath.path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->ipath.path.parallel_aware = false;
+	pathnode->ipath.path.parallel_safe = rel->consider_parallel;
+	pathnode->ipath.path.parallel_workers = 0;
+	pathnode->ipath.path.pathkeys = pathkeys;
+
+	pathnode->ipath.indexinfo = index;
+	pathnode->ipath.indexclauses = indexclauses;
+	pathnode->ipath.indexscandir = indexscandir;
+
+	cost_brinsort(pathnode, root, loop_count, partial_path);
+
+	return pathnode;
+}
+
 /*
  * create_bitmap_heap_path
  *	  Creates a path node for a bitmap scan.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 06dfeb6cd8b..a5ca3bd0cc4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -977,6 +977,16 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_brinsort", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of BRIN sort plans."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_brinsort,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index a7cccac9c90..be05586ec57 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,41 +34,6 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
-/*
- * Info about ranges for BRIN Sort.
- */
-typedef struct BrinRange
-{
-	BlockNumber blkno_start;
-	BlockNumber blkno_end;
-
-	Datum	min_value;
-	Datum	max_value;
-	bool	has_nulls;
-	bool	all_nulls;
-	bool	not_summarized;
-
-	/*
-	 * Index of the range when ordered by min_value (if there are multiple
-	 * ranges with the same min_value, it's the lowest one).
-	 */
-	uint32	min_index;
-
-	/*
-	 * Minimum min_index from all ranges with higher max_value (i.e. when
-	 * sorted by max_value). If there are multiple ranges with the same
-	 * max_value, it depends on the ordering (i.e. the ranges may get
-	 * different min_index_lowest, depending on the exact ordering).
-	 */
-	uint32	min_index_lowest;
-} BrinRange;
-
-typedef struct BrinRanges
-{
-	int			nranges;
-	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
-} BrinRanges;
-
 typedef struct BrinMinmaxStats
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index f4be357c176..06a36f769c5 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -73,6 +73,7 @@ typedef struct BrinDesc
 #define BRIN_PROCNUM_UNION			4
 #define BRIN_MANDATORY_NPROCS		4
 #define BRIN_PROCNUM_OPTIONS 		5	/* optional */
+#define BRIN_PROCNUM_RANGES 		7	/* optional */
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
 #define BRIN_PROCNUM_STATISTICS		11	/* optional */
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 558df53206d..7a22eaef33c 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -806,6 +806,8 @@
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -839,6 +841,8 @@
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -870,6 +874,8 @@
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -901,6 +907,8 @@
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -915,6 +923,8 @@
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -929,6 +939,8 @@
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1048,6 +1060,8 @@
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1078,6 +1092,8 @@
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1128,6 +1144,8 @@
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1181,6 +1199,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1197,6 +1218,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1288,6 +1312,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1344,6 +1371,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1398,6 +1428,8 @@
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1471,6 +1503,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1504,6 +1539,8 @@
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1557,6 +1594,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1573,6 +1613,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1587,6 +1630,8 @@
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1716,6 +1761,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1772,6 +1820,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1824,6 +1875,8 @@
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1841,6 +1894,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1858,6 +1914,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1912,6 +1971,8 @@
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '7', amproc => 'brin_minmax_ranges' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1988,6 +2049,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '7',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1dd9177b01c..18e0824a08e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8411,6 +8411,9 @@
   proname => 'brin_minmax_stats', prorettype => 'bool',
   proargtypes => 'internal internal int2 int2 internal int4',
   prosrc => 'brin_minmax_stats' },
+{ oid => '9980', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_ranges', prorettype => 'bool',
+  proargtypes => 'internal int2 bool', prosrc => 'brin_minmax_ranges' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/executor/nodeBrinSort.h b/src/include/executor/nodeBrinSort.h
new file mode 100644
index 00000000000..2c860d926ea
--- /dev/null
+++ b/src/include/executor/nodeBrinSort.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeBrinSort.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODEBrinSort_H
+#define NODEBrinSort_H
+
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "nodes/execnodes.h"
+
+extern BrinSortState *ExecInitBrinSort(BrinSort *node, EState *estate, int eflags);
+extern void ExecEndBrinSort(BrinSortState *node);
+extern void ExecBrinSortMarkPos(BrinSortState *node);
+extern void ExecBrinSortRestrPos(BrinSortState *node);
+extern void ExecReScanBrinSort(BrinSortState *node);
+extern void ExecBrinSortEstimate(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortReInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeWorker(BrinSortState *node,
+										  ParallelWorkerContext *pwcxt);
+
+/*
+ * These routines are exported to share code with nodeIndexonlyscan.c and
+ * nodeBitmapBrinSort.c
+ */
+extern void ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
+								   List *quals, bool isorderby,
+								   ScanKey *scanKeys, int *numScanKeys,
+								   IndexRuntimeKeyInfo **runtimeKeys, int *numRuntimeKeys,
+								   IndexArrayKeyInfo **arrayKeys, int *numArrayKeys);
+extern void ExecIndexEvalRuntimeKeys(ExprContext *econtext,
+									 IndexRuntimeKeyInfo *runtimeKeys, int numRuntimeKeys);
+extern bool ExecIndexEvalArrayKeys(ExprContext *econtext,
+								   IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+extern bool ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+
+#endif							/* NODEBrinSort_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc09..381c2fcd3d6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1549,6 +1549,109 @@ typedef struct IndexScanState
 	Size		iss_PscanLen;
 } IndexScanState;
 
+typedef enum {
+	BRINSORT_START,
+	BRINSORT_LOAD_RANGE,
+	BRINSORT_PROCESS_RANGE,
+	BRINSORT_LOAD_NULLS,
+	BRINSORT_PROCESS_NULLS,
+	BRINSORT_FINISHED
+} BrinSortPhase;
+
+typedef struct BrinRangeScanDesc
+{
+	/* range info tuple descriptor */
+	TupleDesc		tdesc;
+
+	/* ranges, sorted by minval, blkno_start */
+	Tuplesortstate *ranges;
+
+	/* distinct minval (sorted) */
+	Tuplestorestate *minvals;
+
+	/* slot for accessing the tuplesort/tuplestore */
+	TupleTableSlot  *slot;
+
+} BrinRangeScanDesc;
+
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinSortState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	ExprState  *indexqualorig;
+	List	   *indexorderbyorig;
+	struct ScanKeyData *iss_ScanKeys;
+	int			iss_NumScanKeys;
+	struct ScanKeyData *iss_OrderByKeys;
+	int			iss_NumOrderByKeys;
+	IndexRuntimeKeyInfo *iss_RuntimeKeys;
+	int			iss_NumRuntimeKeys;
+	bool		iss_RuntimeKeysReady;
+	ExprContext *iss_RuntimeContext;
+	Relation	iss_RelationDesc;
+	struct IndexScanDescData *iss_ScanDesc;
+
+	/* These are needed for re-checking ORDER BY expr ordering */
+	pairingheap *iss_ReorderQueue;
+	bool		iss_ReachedEnd;
+	Datum	   *iss_OrderByValues;
+	bool	   *iss_OrderByNulls;
+	SortSupport iss_SortSupport;
+	bool	   *iss_OrderByTypByVals;
+	int16	   *iss_OrderByTypLens;
+	Size		iss_PscanLen;
+
+	/* */
+	BrinRangeScanDesc *bs_scan;
+	BrinRange	   *bs_range;
+	ExprState	   *bs_qual;
+	Datum			bs_watermark;
+	bool			bs_watermark_set;
+	BrinSortPhase	bs_phase;
+	SortSupportData	bs_sortsupport;
+
+	/*
+	 * We need two tuplesort instances - one for current range, one for
+	 * spill-over tuples from the overlapping ranges
+	 */
+	void		   *bs_tuplesortstate;
+	Tuplestorestate *bs_tuplestore;
+} BrinSortState;
+
 /* ----------------
  *	 IndexOnlyScanState information
  *
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6bda383bead..e79c904a8fc 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1596,6 +1596,17 @@ typedef struct IndexPath
 	Selectivity indexselectivity;
 } IndexPath;
 
+/*
+ * read sorted data from brin index
+ *
+ * We use IndexPath, because that's what amcostestimate is expecting, but
+ * we typedef it as a separate struct.
+ */
+typedef struct BrinSortPath
+{
+	IndexPath	ipath;
+} BrinSortPath;
+
 /*
  * Each IndexClause references a RestrictInfo node from the query's WHERE
  * or JOIN conditions, and shows how that restriction can be applied to
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21e642a64c4..c4ef5362acc 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -495,6 +495,32 @@ typedef struct IndexOnlyScan
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
 } IndexOnlyScan;
 
+
+typedef struct BrinSort
+{
+	Scan		scan;
+	Oid			indexid;		/* OID of index to scan */
+	List	   *indexqual;		/* list of index quals (usually OpExprs) */
+	List	   *indexqualorig;	/* the same in original form */
+	ScanDirection indexorderdir;	/* forward or backward or don't care */
+
+	/* number of sort-key columns */
+	int			numCols;
+
+	/* their indexes in the target list */
+	AttrNumber *sortColIdx pg_node_attr(array_size(numCols));
+
+	/* OIDs of operators to sort them by */
+	Oid		   *sortOperators pg_node_attr(array_size(numCols));
+
+	/* OIDs of collations */
+	Oid		   *collations pg_node_attr(array_size(numCols));
+
+	/* NULLS FIRST/LAST directions */
+	bool	   *nullsFirst pg_node_attr(array_size(numCols));
+
+} BrinSort;
+
 /* ----------------
  *		bitmap index scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 204e94b6d10..b77440728d1 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -69,6 +69,7 @@ extern PGDLLIMPORT bool enable_parallel_append;
 extern PGDLLIMPORT bool enable_parallel_hash;
 extern PGDLLIMPORT bool enable_partition_pruning;
 extern PGDLLIMPORT bool enable_async_append;
+extern PGDLLIMPORT bool enable_brinsort;
 extern PGDLLIMPORT int constraint_exclusion;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
@@ -79,6 +80,8 @@ extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 							ParamPathInfo *param_info);
 extern void cost_index(IndexPath *path, PlannerInfo *root,
 					   double loop_count, bool partial_path);
+extern void cost_brinsort(BrinSortPath *path, PlannerInfo *root,
+						  double loop_count, bool partial_path);
 extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 								  ParamPathInfo *param_info,
 								  Path *bitmapqual, double loop_count);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 050f00e79a4..11caad3ec51 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -49,6 +49,15 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									Relids required_outer,
 									double loop_count,
 									bool partial_path);
+extern BrinSortPath *create_brinsort_path(PlannerInfo *root,
+									IndexOptInfo *index,
+									List *indexclauses,
+									List *pathkeys,
+									ScanDirection indexscandir,
+									bool indexonly,
+									Relids required_outer,
+									double loop_count,
+									bool partial_path);
 extern BitmapHeapPath *create_bitmap_heap_path(PlannerInfo *root,
 											   RelOptInfo *rel,
 											   Path *bitmapqual,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 41f765d3422..6aa50257730 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -213,6 +213,9 @@ extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,
 extern Path *get_cheapest_parallel_safe_total_inner(List *paths);
 extern List *build_index_pathkeys(PlannerInfo *root, IndexOptInfo *index,
 								  ScanDirection scandir);
+extern List *build_index_pathkeys_brin(PlannerInfo *root, IndexOptInfo *index,
+								  TargetEntry *tle, int idx,
+								  bool reverse_sort, bool nulls_first);
 extern List *build_partition_pathkeys(PlannerInfo *root, RelOptInfo *partrel,
 									  ScanDirection scandir, bool *partialkeys);
 extern List *build_expression_pathkey(PlannerInfo *root, Expr *expr,
-- 
2.25.1

0004-f-brinsort.patchtext/x-diff; charset=us-asciiDownload

From 65cf8ac1bb314e9f787f1e0fbf939e3cce561fe1 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sat, 15 Oct 2022 10:51:34 -0500
Subject: [PATCH 4/4] f!brinsort

//-os-only: linux-meson
---
 src/backend/executor/meson.build              |  1 +
 src/backend/executor/nodeBrinSort.c           | 11 ++--
 src/backend/utils/misc/guc_tables.c           |  2 +-
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/brin_internal.h            |  2 +-
 src/include/catalog/pg_amproc.dat             | 52 +++++++++----------
 src/include/catalog/pg_proc.dat               |  2 +-
 src/include/executor/nodeBrinSort.h           |  6 +--
 src/test/regress/expected/sysviews.out        |  3 +-
 9 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/src/backend/executor/meson.build b/src/backend/executor/meson.build
index 518674cfa28..c1fc50120d1 100644
--- a/src/backend/executor/meson.build
+++ b/src/backend/executor/meson.build
@@ -24,6 +24,7 @@ backend_sources += files(
   'nodeBitmapHeapscan.c',
   'nodeBitmapIndexscan.c',
   'nodeBitmapOr.c',
+  'nodeBrinSort.c',
   'nodeCtescan.c',
   'nodeCustom.c',
   'nodeForeignscan.c',
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index ca72c1ed22d..fcb0de71b9b 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -459,7 +459,7 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 	scan = node->ss.ss_currentScanDesc;
 
 	/*
-	 * Read tuples, evaluate the filer (so that we don't keep tuples only to
+	 * Read tuples, evaluate the filter (so that we don't keep tuples only to
 	 * discard them later), and decide if it goes into the current range
 	 * (tuplesort) or overflow (tuplestore).
 	 */
@@ -501,7 +501,7 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 		 *
 		 * XXX However, maybe we could also leverage other bitmap indexes,
 		 * particularly for BRIN indexes because that makes it simpler to
-		 * eliminage the ranges incrementally - we know which ranges to
+		 * eliminate the ranges incrementally - we know which ranges to
 		 * load from the index, while for other indexes (e.g. btree) we
 		 * have to read the whole index and build a bitmap in order to have
 		 * a bitmap for any range. Although, if the condition is very
@@ -896,9 +896,9 @@ IndexNext(BrinSortState *node)
 
 							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
 
-							elog(DEBUG1, "method: %s  space: %ld kB (%s)",
+							elog(DEBUG1, "method: %s  space: %lld kB (%s)",
 								 tuplesort_method_name(stats.sortMethod),
-								 stats.spaceUsed,
+								 (long long)stats.spaceUsed,
 								 tuplesort_space_type_name(stats.spaceType));
 						}
 #endif
@@ -1015,7 +1015,6 @@ IndexNext(BrinSortState *node)
 				/* read tuples from the tuplesort range, and output them */
 				if (node->bs_tuplestore != NULL)
 				{
-
 					while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
 						return slot;
 
@@ -1287,7 +1286,7 @@ ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
 	 * Should not get here without a proc, thanks to the check before
 	 * building the BrinSort path.
 	 */
-	Assert(rangeproc != NULL);
+	Assert(OidIsValid(rangeproc->fn_oid));
 
 	memset(&planstate->bs_sortsupport, 0, sizeof(SortSupportData));
 	PrepareSortSupportFromOrderingOp(node->sortOperators[0], &planstate->bs_sortsupport);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index a5ca3bd0cc4..27fb720d842 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -984,7 +984,7 @@ struct config_bool ConfigureNamesBool[] =
 			GUC_EXPLAIN
 		},
 		&enable_brinsort,
-		false,
+		true,
 		NULL, NULL, NULL
 	},
 	{
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 8c5d442ff45..3f44d1229f4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -370,6 +370,7 @@
 
 #enable_async_append = on
 #enable_bitmapscan = on
+#enable_brinsort = off
 #enable_gathermerge = on
 #enable_hashagg = on
 #enable_hashjoin = on
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index 06a36f769c5..355dddcc225 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -73,10 +73,10 @@ typedef struct BrinDesc
 #define BRIN_PROCNUM_UNION			4
 #define BRIN_MANDATORY_NPROCS		4
 #define BRIN_PROCNUM_OPTIONS 		5	/* optional */
-#define BRIN_PROCNUM_RANGES 		7	/* optional */
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
 #define BRIN_PROCNUM_STATISTICS		11	/* optional */
+#define BRIN_PROCNUM_RANGES 		12	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 7a22eaef33c..0d192ce40ee 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -807,7 +807,7 @@
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
-  amprocrighttype => 'bytea', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'bytea', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -842,7 +842,7 @@
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
-  amprocrighttype => 'char', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'char', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -875,7 +875,7 @@
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
-  amprocrighttype => 'name', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'name', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -908,7 +908,7 @@
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
-  amprocrighttype => 'int8', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'int8', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -924,7 +924,7 @@
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
-  amprocrighttype => 'int2', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'int2', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -940,7 +940,7 @@
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
-  amprocrighttype => 'int4', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'int4', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1061,7 +1061,7 @@
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
-  amprocrighttype => 'text', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'text', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1093,7 +1093,7 @@
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
-  amprocrighttype => 'oid', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'oid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1145,7 +1145,7 @@
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
-  amprocrighttype => 'tid', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'tid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1200,7 +1200,7 @@
   amprocrighttype => 'float4', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
-  amprocrighttype => 'float4', amprocnum => '7',
+  amprocrighttype => 'float4', amprocnum => '12',
   amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
@@ -1219,7 +1219,7 @@
   amprocrighttype => 'float8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
-  amprocrighttype => 'float8', amprocnum => '7',
+  amprocrighttype => 'float8', amprocnum => '12',
   amproc => 'brin_minmax_ranges' },
 
 # minmax multi float
@@ -1313,7 +1313,7 @@
   amprocrighttype => 'macaddr', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
-  amprocrighttype => 'macaddr', amprocnum => '7',
+  amprocrighttype => 'macaddr', amprocnum => '12',
   amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr
@@ -1372,7 +1372,7 @@
   amprocrighttype => 'macaddr8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
-  amprocrighttype => 'macaddr8', amprocnum => '7',
+  amprocrighttype => 'macaddr8', amprocnum => '12',
   amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr8
@@ -1429,7 +1429,7 @@
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
-  amprocrighttype => 'inet', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'inet', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1504,7 +1504,7 @@
   amprocrighttype => 'bpchar', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
-  amprocrighttype => 'bpchar', amprocnum => '7',
+  amprocrighttype => 'bpchar', amprocnum => '12',
   amproc => 'brin_minmax_ranges' },
 
 # bloom character
@@ -1540,7 +1540,7 @@
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
-  amprocrighttype => 'time', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'time', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1595,7 +1595,7 @@
   amprocrighttype => 'timestamp', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
-  amprocrighttype => 'timestamp', amprocnum => '7',
+  amprocrighttype => 'timestamp', amprocnum => '12',
   amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
@@ -1614,7 +1614,7 @@
   amprocrighttype => 'timestamptz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
-  amprocrighttype => 'timestamptz', amprocnum => '7',
+  amprocrighttype => 'timestamptz', amprocnum => '12',
   amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
@@ -1631,7 +1631,7 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
-  amprocrighttype => 'date', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'date', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1762,7 +1762,7 @@
   amprocrighttype => 'interval', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
-  amprocrighttype => 'interval', amprocnum => '7',
+  amprocrighttype => 'interval', amprocnum => '12',
   amproc => 'brin_minmax_ranges' },
 
 # minmax multi interval
@@ -1821,7 +1821,7 @@
   amprocrighttype => 'timetz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
-  amprocrighttype => 'timetz', amprocnum => '7',
+  amprocrighttype => 'timetz', amprocnum => '12',
   amproc => 'brin_minmax_ranges' },
 
 # minmax multi time with time zone
@@ -1876,7 +1876,7 @@
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
-  amprocrighttype => 'bit', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'bit', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1895,7 +1895,7 @@
   amprocrighttype => 'varbit', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
-  amprocrighttype => 'varbit', amprocnum => '7',
+  amprocrighttype => 'varbit', amprocnum => '12',
   amproc => 'brin_minmax_ranges' },
 
 # minmax numeric
@@ -1915,7 +1915,7 @@
   amprocrighttype => 'numeric', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
-  amprocrighttype => 'numeric', amprocnum => '7',
+  amprocrighttype => 'numeric', amprocnum => '12',
   amproc => 'brin_minmax_ranges' },
 
 # minmax multi numeric
@@ -1972,7 +1972,7 @@
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
-  amprocrighttype => 'uuid', amprocnum => '7', amproc => 'brin_minmax_ranges' },
+  amprocrighttype => 'uuid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -2050,7 +2050,7 @@
   amprocrighttype => 'pg_lsn', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
-  amprocrighttype => 'pg_lsn', amprocnum => '7',
+  amprocrighttype => 'pg_lsn', amprocnum => '12',
   amproc => 'brin_minmax_ranges' },
 
 # minmax multi pg_lsn
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 18e0824a08e..2bd034e7616 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5252,7 +5252,7 @@
   proname => 'pg_stat_get_numscans', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_numscans' },
-{ oid => '9976', descr => 'statistics: time of the last scan for table/index',
+{ oid => '8912', descr => 'statistics: time of the last scan for table/index',
   proname => 'pg_stat_get_lastscan', provolatile => 's', proparallel => 'r',
   prorettype => 'timestamptz', proargtypes => 'oid',
   prosrc => 'pg_stat_get_lastscan' },
diff --git a/src/include/executor/nodeBrinSort.h b/src/include/executor/nodeBrinSort.h
index 2c860d926ea..3cac599d811 100644
--- a/src/include/executor/nodeBrinSort.h
+++ b/src/include/executor/nodeBrinSort.h
@@ -11,8 +11,8 @@
  *
  *-------------------------------------------------------------------------
  */
-#ifndef NODEBrinSort_H
-#define NODEBrinSort_H
+#ifndef NODEBRIN_SORT_H
+#define NODEBRIN_SORT_H
 
 #include "access/genam.h"
 #include "access/parallel.h"
@@ -44,4 +44,4 @@ extern bool ExecIndexEvalArrayKeys(ExprContext *econtext,
 								   IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
 extern bool ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
 
-#endif							/* NODEBrinSort_H */
+#endif							/* NODEBRIN_SORT_H */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index b19dae255e9..7b697941cad 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
 --------------------------------+---------
  enable_async_append            | on
  enable_bitmapscan              | on
+ enable_brinsort                | on
  enable_gathermerge             | on
  enable_hashagg                 | on
  enable_hashjoin                | on
@@ -132,7 +133,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(21 rows)
+(22 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.25.1

#20

Tomas Vondra

tomas.vondra@enterprisedb.com

about 3 years ago

In reply to: Justin Pryzby (#19)

Re: PATCH: Using BRIN indexes for sorted output

On 10/24/22 06:32, Justin Pryzby wrote:

On Sat, Oct 15, 2022 at 02:33:50PM +0200, Tomas Vondra wrote:

Of course, if there are e.g. BTREE indexes this is going to be slower,
but people are unlikely to have both index types on the same column.

On Sun, Oct 16, 2022 at 05:48:31PM +0200, Tomas Vondra wrote:

I don't think it's all that unfair. How likely is it to have both a BRIN
and btree index on the same column? And even if you do have such indexes

Note that we (at my work) use unique, btree indexes on multiple columns
for INSERT ON CONFLICT into the most-recent tables: UNIQUE(a,b,c,...),
plus a separate set of indexes on all tables, used for searching:
BRIN(a) and BTREE(b). I'd hope that the costing is accurate enough to
prefer the btree index for searching the most-recent table, if that's
what's faster (for example, if columns b and c are specified).

Well, the costing is very crude at the moment - at the moment it's
pretty much just a copy of the existing BRIN costing. And the cost is
likely going to increase, because brinsort needs to do regular BRIN
bitmap scan (more or less) and then also a sort (which is an extra cost,
of course). So if it works now, I don't see why would brinsort break it.
Moreover, if you don't have ORDER BY in the query, I don't see why would
we create a brinsort at all.

But if you could test this once the costing gets improved, that'd be
very valuable.

+	/* There must not be any TID scan in progress yet. */
+	Assert(node->ss.ss_currentScanDesc == NULL);
+
+	/* Initialize the TID range scan, for the provided block range. */
+	if (node->ss.ss_currentScanDesc == NULL)
+	{

Why is this conditional on the condition that was just Assert()ed ?

Yeah, that's a mistake, due to how the code evolved.

+void
+cost_brinsort(BrinSortPath *path, PlannerInfo *root, double loop_count,
+		   bool partial_path)
It's be nice to refactor existing code to avoid this part being so
duplicitive.
+	 * In some situations (particularly with OR'd index conditions) we may
+	 * have scan_clauses that are not equal to, but are logically implied by,
+	 * the index quals; so we also try a predicate_implied_by() check to see
Isn't that somewhat expensive ?

If that's known, then it'd be good to say that in the documentation.

Some of this is probably a residue from create_indexscan_path and may
not be needed for this new node.

+	{
+		{"enable_brinsort", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of BRIN sort plans."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_brinsort,
+		false,
I think new GUCs should be enabled during patch development.
Maybe in a separate 0002 patch "for CI only not for commit".
That way "make check" at least has a chance to hit that new code paths.

Also, note that indxpath.c had the var initialized to true.

Good point.

+			attno = (i + 1);
+       nranges = (nblocks / pagesPerRange);
+                               node->bs_phase = (nullsFirst) ? BRINSORT_LOAD_NULLS : BRINSORT_LOAD_RANGE;

I'm curious why you have parenthesis these places ?

Not sure, it seemed more readable when writing the code I guess.

+#ifndef NODEBrinSort_H
+#define NODEBrinSort_H

NODEBRIN_SORT would be more consistent with NODEINCREMENTALSORT.
But I'd prefer NODE_* - otherwise it looks like NO DEBRIN.

Yeah, stupid search/replace on the indescan code, which was used as a
starting point.

This needed a bunch of work needed to pass any of the regression tests -
even with the feature set to off.

. meson.build needs the same change as the corresponding ./Makefile.
. guc missing from postgresql.conf.sample
. brin_validate.c is missing support for the opr function.
I gather you're planning on changing this part (?) but this allows to
pass tests for now.
. mingw is warning about OidIsValid(pointer) in nodeBrinSort.c.
https://cirrus-ci.com/task/5771227447951360?logs=mingw_cross_warning#L969
. Uninitialized catalog attribute.
. Some typos in your other patches: "heuristics heuristics". ste.
lest (least).

Thanks, I'll get this fixed. I've posted the patch as a PoC to showcase
it and gather some feedback, I should have mentioned it's incomplete in
these ways.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21

Greg Stark

stark@mit.edu

about 3 years ago

In reply to: Tomas Vondra (#17)

Re: PATCH: Using BRIN indexes for sorted output

Fwiw tuplesort does do something like what you want for the top-k
case. At least it used to last I looked -- not sure if it went out
with the tapesort ...

For top-k it inserts new tuples into the heap data structure and then
pops the top element out of the hash. That keeps a fixed number of
elements in the heap. It's always inserting and removing at the same
time. I don't think it would be very hard to add a tuplesort interface
to access that behaviour.

For something like BRIN you would sort the ranges by minvalue then
insert all the tuples for each range. Before inserting tuples for a
new range you would first pop out all the tuples that are < the
minvalue for the new range.

I'm not sure how you handle degenerate BRIN indexes that behave
terribly. Like, if many BRIN ranges covered the entire key range.
Perhaps there would be a clever way to spill the overflow and switch
to quicksort for the spilled tuples without wasting lots of work
already done and without being too inefficient.

#22

Tomas Vondra

tomas.vondra@enterprisedb.com

about 3 years ago

In reply to: Greg Stark (#21)

Re: PATCH: Using BRIN indexes for sorted output

On 11/16/22 22:52, Greg Stark wrote:

Fwiw tuplesort does do something like what you want for the top-k
case. At least it used to last I looked -- not sure if it went out
with the tapesort ...

For top-k it inserts new tuples into the heap data structure and then

pops the top element out of the hash. That keeps a fixed number of
elements in the heap. It's always inserting and removing at the same
time. I don't think it would be very hard to add a tuplesort interface
to access that behaviour.

Bounded sorts are still there, implemented using a heap (which is what
you're talking about, I think). I actually looked at it some time ago,
and it didn't look like a particularly good match for the general case
(without explicit LIMIT). Bounded sorts require specifying number of
tuples, and then discard the remaining tuples. But you don't know how
many tuples you'll actually find until the next minval - you have to
keep them all.

Maybe we could feed the tuples into a (sorted) heap incrementally, and
consume tuples until the next minval value. I'm not against exploring
that idea, but it certainly requires more work than just slapping some
interface to existing code.

For something like BRIN you would sort the ranges by minvalue then
insert all the tuples for each range. Before inserting tuples for a
new range you would first pop out all the tuples that are < the
minvalue for the new range.

Well, yeah. That's pretty much exactly what the last version of this
patch (from October 23) does.

I'm not sure how you handle degenerate BRIN indexes that behave
terribly. Like, if many BRIN ranges covered the entire key range.
Perhaps there would be a clever way to spill the overflow and switch
to quicksort for the spilled tuples without wasting lots of work
already done and without being too inefficient.

In two ways:

1) Don't have such BRIN index - if it has many degraded ranges, it's
bound to perform poorly even for WHERE conditions. We've lived with this
until now, I don't think this makes the issue any worse.

2) Improving statistics for BRIN indexes - until now the BRIN costing is
very crude, we have almost no information about how wide the ranges are,
how much they overlap, etc. The 0001 part (discussed in a thread [1]https://commitfest.postgresql.org/40/3952/)
aims to provide much better statistics. Yes, the costing still doesn't
use that information very much.

regards

[1]: https://commitfest.postgresql.org/40/3952/

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#23

Andres Freund

andres@anarazel.de

about 3 years ago

In reply to: Tomas Vondra (#22)

Re: PATCH: Using BRIN indexes for sorted output

Hi,

On 2022-11-17 00:52:35 +0100, Tomas Vondra wrote:

Well, yeah. That's pretty much exactly what the last version of this
patch (from October 23) does.

That version unfortunately doesn't build successfully:
https://cirrus-ci.com/task/5108789846736896

[03:02:48.641] Duplicate OIDs detected:
[03:02:48.641] 9979
[03:02:48.641] 9980
[03:02:48.641] found 2 duplicate OID(s) in catalog data

Greetings,

Andres Freund

#24

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Andres Freund (#23)

12 attachment(s)

Re: PATCH: Using BRIN indexes for sorted output

Hi,

Here's an updated version of this patch series. There's still plenty of
stuff to improve, but it fixes a number of issues I mentioned earlier.

The two most important changes are:

1) handling of projections

Until now queries with projection might have failed, due to not using
the right slot, bogus var references and so on. The code was somewhat
confused because the new node is somewhere in between a scan node and a
sort (or more precisely, it combines both).

I believe this version handles all of this correctly - the code that
initializes the slots/projection info etc. needs serious cleanup, but
should be correct.

2) handling of expressions

The other improvement is handling of expressions - if you have a BRIN
index on an expression, this should now work too. This also includes
correct handling of collations (which the previous patches ignored).

Similarly to the projections, I believe the code is correct but needs
cleanup. In particular, I haven't paid close attention to memory
management, so there might be memory leaks when evaluating expressions.

The last two parts of the patch series (0009 and 0010) are about
testing. 0009 adds a regular regression test with various combinations
(projections, expressions, single- vs. multi-column indexes, ...).

0010 introduces a python script that randomly generates data sets,
indexes and queries. I use it to both test random combinations and to
evaluate performance. I don't expect it to be committed etc. - it's
included only to keep it versioned with the rest of the patch.

I did some basic benchmarking using the 0010 part, to evaluate the how
this works for various cases. The script varies a number of parameters:

- number of rows
- table fill factor
- randomness (how much ranges overlapp)
- pages per range
- limit / offset for queries
- ...

The script forces both a "seqscan" and "brinsort" plan, and collects
timing info.

The results are encouraging, I think. Attached are two charts, plotting
speedup vs. fraction of tuples the query has to sort.

speedup = (seqscan timing / brinsort timing)

fraction = (limit + offset) / (table rows)

A query with "limit 1 offset 0" has fraction ~0.0, query that scans
everything (perhaps because it has no LIMIT/OFFSET) has ~1.0.

For speedup, 1.0 means "no change" while values above 1.0 means the
query gets faster. Both plots have log-scale y-axis.

brinsort-all-data.gif shows results for all queries. There's significant
speedup for small values of fraction (i.e. queries with limit, requiring
few rows). This is expected, as this is pretty much the primary use case
for the patch.

The other thing is that the benefits quickly diminish - for fractions
close to 0.0 the potential benefits are huge, but once you cross ~10% of
the table, it's within 10x, for ~25% less than 5x etc.

OTOH there are also a fair number of queries that got slower - those are
the data points below 1.0. I've looked into many of them, and there are
a couple reasons why that can happen:

1) random data set - When the ranges are very wide, BRIN Sort has to
read most of the data, and it ends up sorting almost as many rows as the
sequential scan. But it's more expensive, especially when combined with
the following points.

Note: I don't think is an issue in practice, because BRIN indexes
would suck quite badly on such data, so no one is going to create
such indexes in the first place.

2) tiny ranges - By default ranges are 1MB, but it's possible to make
them much smaller. But BRIN Sort has to read/sort all ranges, and that
gets more expensive with the number of ranges.

Note: I'm not sure there's a way around this, although Matthias
had some interesting ideas about how to keep the ranges sorted.
But ultimately, I think this is fine, as long as it's costed
correctly. For fractions close to 0.0 this is still going to be
a huge win.

3) non-adaptive (and low) watermark_step - The number of sorts makes a
huge difference - in an extreme case we could add the ranges one by one,
with a sort after each. For small limit/offset that works, but for more
rows it's quite pointless.

Note: The adaptive step (adjusted during execution) works great, and
the script sets explicit values mostly to trigger more corner cases.
Also, I wonder if we should force higher values as we progress
through the table - we still don't want to exceed work_mem, but the
larger fraction we scan the more we should prefer larger "batches".

The second "filter" chart (brinsort-filtered-data.gif) shows results
filtered to only show runs with:

- pages_per_range >= 32
- randomness <= 5% (i.e. each range covers about 5% of domain)
- adaptive step (= -1)

And IMO this looks much better - there are almost no slower queries,
except for a bunch of queries that scan all the data.

So, what are the next steps for this patch:

1) cleanup of the existing code (mentioned above)

2) improvement of the costing - This is probably the critical part,
because we need a costing that allows us to identify the queries that
are likely to be faster/slower. I believe this is doable - either now or
using the new opclass-specific stats proposed in a separate patch (and
kept in part 0001 for completeness).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

brinsort-filtered-data.gifimage/gif; name=brinsort-filtered-data.gifDownload

GIF89aS���###+++333<<<47<4;F5<I5?P6E]7Jh9Ou9S~DDDLLLTTT\\\dddkkkuuu|||:W�;`�=e�=k�?s�?v�@x�@y�A~�A��R��\��W��B��F��L��Q��U��[��_��c��j��k��t��p��d��i��`��|��s��|��m��t��{��~��w�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������!�,S��m	H����*\�����#J�H����3j������ C�I����(S�\�����0c��I����8s�������@�
J����H�*]�����P�J�J����X�j������`��K����h��]�����p���K����x���������L�����+^������#K�L�����3k�������C�M�����S�^������c��M�����s���������N������+_�������K�N������k����������O�������_���������O�������������(��h��&���6���F(��Vh��f���v��� �(��$�h��(����,����0�(��4�h��8����<����@)��Di��H&���L6���PF)��TVi��Xf���\v���`�)��d�i��h����l����p�)��t�i��x����|�����*���j���&����6����F*���Vj���f����v�����*����j���������������*����j�������������+���k���&����6����F+���Vk���f����v�����+����k��������������+����k������������,���������B��;FIG,�����#�LD��A���a���2+��+�L�)���>\`��8���{(81%
��8m��9�P��
���P[���<���'���BX�q"��Q�,D�JU��7�!��v]����BN��7�@�����E�m��G;R���o���{}�I-��nl[�6���)w�P��BysNw����ntwC���qb��7��:Px�`��}y����9�G_�yB���'4�����]O����G������vP���3���{���9��<����T�4L#��?�����5{�g��pzF3�D�=���H���@	N���p�� ��
E�������i}�[�*Q t0x3�D� ����y`�Z�Z��!��@����W���)
�'�����'4Z";�]�{�	b�b���!����V���C��8,�d�7#n*�E(�K(��*���)D����;1BM���E�`4hO~d�	w�D����Z�cD�����"@X!���ue#(�v����h�Sb��;L��J����C��"{8Z�
b�R6r��4#Ef�����f_��$�x:���R�%D�y�_VD~�$7�@!0��d�"-R�Pd���,�H�b��N��Cj��37��y��(Ne�z�(*{hP���"���CJ13 ��?��0���b ]������a �4C
�^^�{�E1
� ��!,�('�@�7�HL�p��a��'� ��f4�g�YQ�GPk$�!}VAZ���2���0�p�����C�y3���	D�8RN[�zYgE�IM�pB�s�?�.���=m�����m��Z+�F�A0�A{3b����KR
!�����8������`b������	�^,��Y��yE��I�J���!�lf7?I��m�l(A�j�s!z�h'�Jj@��k���"�O�nD^q��a��`���87\W!G��4�G|����b������ ��k�vk��Q �%�k�Ka�w!���bK�W�� s=h������W �����7��r�����J1��u-�G�C^Z��@��/#�v��z4����pR���!H�[h,Q�
��-AL��
0��}.��^q�N����(W�J���������P�z��H�/���`�H,A�������H��Q�i�Dk�P�����������������'C�x3���f����iT��yuL0Cs�e���@��:�������!�A�g�%:"��41�l�B�h�%H�'���!u��
:����x@�f����^����	/��q��
\e��EBG� �3'd�B\AJ1�~G[ ���Z���!A����m|s�WV0��e+���"�G>f
7du�x���V�����=H�SWl[��y�.�w���m�����<����H�����-N�#l��BV���V�)���n� F����B��3���8��@8�F��� ����y��)#d��#������ ����`�L�t�fX3}�����5�+2u���l���vB@{x���h��"�����s��o��z��@�V��]��t���d�+_��s�K��H��e�G�����������x�e��z@��[��uu�K�lg$��6�EJ����@XG�#��g�xx��>��b����h@O�}��c9����y	�Z�[�K�g��Ed'5��cu�~7����`�fE`&c��|��'::����v�Wy47gqs&Fy��uj�#or�3���eo!�kQ�p��F�'Ts��H
�j��L<tq�x��v�LF�qE�z�4{!�@��Gw����bH(:'}#A�737]_�6����F�sO�YTiL��E3"�A����7�f�wwl	W�����\q?nD�{�j��q�Vjh�H�3�7��xeH;fCiXv���sHN_x^�v���0��8<���3�W�6^E�p
!��q�����������K�a(���fW9~�	w�E��h|E�
q�����t}�WT0h<�G;=pT�'s���*�`<��33vXeT�v��\��3�Xz���C��
�Z�XfI�1d8CD��	Q9�8��xr����8�R����U�&i��e1uZ<nF�3C�2iN��8B@�����o���(�s��A��VO�hZHP���!��p�v�n1i��L�3��5X�E���h\)�7���L��3����V1w�t�)yU�����raD��d��;��
~��"�S��;:�h}�p�z�#�	�pG�	��5�Yh4��
�hr���s�l���q���-���T���w�\����GW�����@��+�<K(mB)���u��w?V����<�7<�~�Zp9	�d�0�<�����`�����h�������VQ����q�D���9�	�
��l8�<y>	��i�pZX�ZsIE������6��bR��V������	��=��@�������`\���������A��p(��tuZ�7�I�;��A��(!t���@~t?*��('�d����K���8�H;�Y41I������G���,:����)x4f%��4N��<�}ED�*3�
a0��2���3���Z��6<�d��zE��g4K�&�8si����?��Z�9���W�����1|
��
f4!�3EJ���f��)z�H�����v���6����v�]%J�W��	z����#�B`�@dEc��Y4����y}�H	�	w��f���������ZP���":�����l�Z�I������������F��i�%Q
-4`��Z5Z��j4��h��K������i�:�E��W�����Q��E���������p���(����L�x��
M����
�+�Q
��E����W�������yjj�����*�����P;k�����h��c	��8��eC*{�`Q�|�G�3+���	�F������L~Z�fi�9l+����{K�Q9�:�
��E���S�>1�����6�������`�]Ys��j������j��
@$;U{3��i�F�3���ej6�k�+rk�;?�Qq�
������KL���Z4N�����,�r��j��x&{�)��b
�VD��i�Z��@zG�M���q��[�h{�����%f�{�p���S��	�^������?��lpj7��h�������
!��KW�c1}/���w4�c��K�A�	(��G��
�i<{�d�u�S���U��|RYG��]%P�{tY�������������\-|�l�P���kEy�8G�����]�f�I��>P��q������zc����x�c���7�@e|��[J�5Y3������{w�f�B���q���	�4��������y@�{U=��(�]=0qE�rF<�a�5��ok����\X���{�n<vZ,i����P����e�W��.uy�l�fwQ��%�HlW�����?����<7-���zJ�lLM<�O�>�Z�W{�{�2C@��o�do�@iG�� �
1C�����XWZU���DP~	K�s��(h�����c�Cx��EW�)(��+�=p�G���/�F�Yt[�]�H;:�����GCzL��a���0�b`���kC �8y.\�����!�,=A`q:���fSM�A,���kB��/-���]�V��i9�X>]�q4;�eq�S~fE���O�wF�k%}�b
�8���@�Ry���)�X����{���[t
5���!�=ZG<�w�?��Z�x`������O�@�e�u=�E	���\��?y����c���Q�tI����j���,�A�a�G��ZvS����|�lw����}����\6D������(�hH��]�nC���qtR����V�����J}v����MZD�nm�
�r��C���v6�i�6@w��	���a��N���dP�X�j�es��J	����;]	>D�
���8��5���|ks�����.�`1 �"��|���!n�
q�"NP����m!��E4��J�u��,IUAp@�|t�B��(���TUSs���'S��jde���[s��|>���jB��c�B@N6|FV��	��6��i,n1*�
v��b��:(��-=�����z�#dA"��I!R1�)x�.��~r �
��q���{��B�����~��B��a�����������>�����!�����b��������������^��"��A���������q����*���>���+�~�����*DU��#<&�?�����������������>��^��~����������������>��^��~��������������?�_����
�������?�_���������� �"?�$_�&�(��*��,��.��0�2�?�4_�6�8��B	<���S--���re�6Q
<_2D�
I����AA��!�kH��u-��$��@	�L2Fo�� �X�MEx�������`Q��?'V�J�4T�����tv�B�z�AW��[���B�o�Q�����A��*?��]��Q�����
n?V�O��W�Rou�:QY������QK�����%�Q��k,8{�d�C������Q��Y����$����������C1�w�@�;����E!��
�o���'��{�M��R
�?��c�@�gm�����}c��O��1�5%x�oS��D4��>ce�aK�@�
D�!�
�(�Q�BN
9�����=~Y��E!!M����!�	�X��2��AZ���S�N�=}�T�P�_��C��6�b�������tZT�V�]�~V�X�e��-��)Z�m����	$��"-::8���'�����bC�_�BL����
K.�5p�RA`��h0d��=z'������#��:��6�	��d��m���[�n��?�u�[8QLCV{<�#$��5�-(;���i�|q���?���:N�0e�\9~�z��u����f�U7�SB�Y��0@$�@��k�@	�C��:Z$��3��9*th8������"i�������br-��>d�E�kh���*%��`2��J
�2H!���$:��#OZ��&jrI[�O�(oR�&���0)s��+?J�K�J�#��<�1����p �[i�&�	��6��N3������'/����kIO�H����C�������B�d4�M��/���R��T�UFE��FY�3TO���� 4���.L��/���%�(!�Xc�E6Y�J�Y�0Inb�������#U�-�<��)[�0�C�9��R!o�
�m�mV o�WJ&\i�/W�v�{//�8�\ ��MpkrW��/�Xo���5;"�:�&�n �x�3C�y��$�������d��8�9�����7�!�;����mV`N�9!<��� �
F��!�<W��f��g�"j)�V.���$�:n��m�q\������o�9"��.f�R6/g���b�
���9������n���R���E��R�B���V�=���]J\&�n�h����@t �����f�}*p�S����g�L�(5�R�,����\Y��G>�����-�_�-.���������5�b�#�HM��^9]��nj������_��b!}�{�y�|��0�~�I����w.���}aN�l�=/SEq���C8�pB��cYA"�>��P�w1�a7� X�`C��#���~%�&��o���@��C3��y��k���%����&���M'�
!���A���� �+��0W��`(C�XDb�[[D�g� ��4�D�����bR��J���������C4�il�	�k�246�v{�`O���������d&5���0O��)Z!�&�X�yZlU!-b6��B`U���A��b@X�$�A'dU\�%	���������I6� �Fu�&���!��a����]B8u��%�@f������
/� &��',�-�@�X$���$���R$�b��g�"�=�]>���y�dfC\bD����D�Y��X2rN��3��L)
r���
���cH,R��/!�P%L���o�o'9e�8J>N��G���!� kD��&�L6:� 1-aO�5�E��v��jV����0�>@�
7]cL�g�X�>��fb�{�%��'�W����JS���U�nGY�c��	L$&��%�z"zN�s�sp'(a5i���*^�	��dT��CY�W��T*�-��`b���&C���j��[>A�����J���^���*�����&v���d���v�Un�ZD�	dh�-����zi��(�nDB<�\����,gS��7���uU�k���6E��%�qW�	�R�)O�aS,�I�D3��x^�P�H�5�A�]�J��jB�rmW=W
6���iR#-����>�e���S���-EM���k���:�}��L�&x��HK���������k{i�U�����e^=�a�,r"�1(y�H�jW��=�:f2�����V�-�����J
=����:��Z�D�Sr��0�s�.C��S=f;����/����_�w��H9S�t��;�
�^��`�O/�?�f��[����lmUM[��N�RmA-������`[$������
C9�0����Uwz���h�"U�������'
N��l�}NJ[��LZ�mI��d������d�)�H _mU������qSf�-PK���+������qz�f��Z�hNid��vN�UL��a�x�5�q,��{�;I��<r���������8�$ST-_2�h�^���5L���J�0:w��������X�	��^�:Q8/���DRG�D�������d����Aq�B�c��#�D�	���Z5��l�6��i�U���);B�s�����������X:UN������z'��Q�z��0�;�f�4A|���q85�3��E����2Hg������d�APL(*��
���L�_����;?����8��H��!` ���v�>!�^���?}��������2I'��L^���1os��~�?���n�`ZrUnzC�h6�+E�����G���}����"����2���B�U�0�+�� ��)��K��T
������S�������S��
�������&
��c�s�A����=+�	��;��6�x>z�9	T7�3����:[(A�H���;�x�@�����/��%-��C�Y��U3!�������6k����@����������(&l������@��8%��,����4!�<"���vc��pB�Z��k!���i��?�D,0���=a���#l	|	V������j&�P N������Z�J=�;���R_T'`L��"<�P���4�K��h����D�6� B�C��������������3%�xF_"�,E���(��1������3
�;i�E��:����	7���I��k?a��It�ttG��2U�����<��� �1��F(���$������;TDy��F�S�=|��mdI%,/��DL�I�0�����H�K��fBD�s����;���(�:���bB�����-��U������c���Ja���t�t8f����G
E(�=3���n�Cl��A�������(���� LI�(��L
G����+ ���XI�L7�����I��.�	�x�����`������������r��"�1�iE~��4�����c�J�8�v)��:���B��F�t�~��9����:���K���% �������N�xE���S�qj��i	����
��<:������y�tIH���=�����&��Y���C�����l���N�DP;��� &�9J(���HP<�N����D���C�r�q�u0L��2�B	�O��P����(M-a��J��������H�:��S=���L�,Q����>G����������O��4&��4��	�bQ�'} �?��1%S��M$Y���
��N��� #���xt��IB6�I�&��>��H���� �K��!��������>;
���	N�&`����R[��������tO��:m�zK��@�O�P%�A=�tCV�����?��lJ�=zPrSS�'�e�O��U�����@���B�f�QSD����59�R�0��;`�0::<SQ$Z�FO�4�p&���]>��5���E-�z��{-6f4����,�E"NR����j���I<}���S���t�=2"�f��%)�xWu������P:��	I<���0O/��Y��I�������@���>%V�;U��@��DZE��"5��l��Qz��7]���Um����At�Y��Y,�*��9Dh�N�r�W)��k������������V����`Qp� quX�,Ws}�q������)L��Zw��q���5���R}�/	�9���W�L�LI���K
�sdN����S����d�����5���,{[/\���k$������Gm�������	�!��� ��Ya�] �]��%U�1U4���X���� 8��5��@+"Y����^�%��\�V�\Zj��uD`m$	.�M+���;�T��M����EQ����4=����������%R=����D2�|�����;M��U�8����V���������x;y4��D�Va�TU���L�\S����E������B������s�EC/���e�������!�O������Y��1Y�A����($����Y}G����������!#j9��0���%��=	Z�u���4���8;,�:N� �G3���c6N�
5h��q����Y�p��/�AO�b�JR��y�A�Q��B�\���4>�$��4��9Ve[�������[��>;����\�������������!nX!�N>�d��4�K�������4J�.�����K�#K\�Ut
D<��5_C-��A����e&�Rc��K|���
J8���"��N~S��}|Z�(����+Fx��s�J�%|�z:7��	,�ZFY^��iKMV`�`���"�`QR�P,���6n�I_B������i�(���(�����7\��&9]6G^�rN�6
����^�����J��eX�uX"������*.J��c�k���
�!SX��/>h�9���A�g�RG����h��]`��k�#U�Zcfn�=���	N�E|���<�<�-}:�F��-��N����u����k�*P����G�\�R@l�@�VI�&��r��x��~�S^`�N�E*m���`4���4|Y>^l/�@������da^n�v���>��EB�E��C���T�������f�d���!U��;B�U�����T��f�f��V�\NE~��*��������b�E��0���Uk�u^����#�6���f����c\$����^0��S�F1�6��~��d���A�\����S}4o�8����M������h�	6���9�L��|��o��[���ET��2}lm�T���+j�^P�����.��H
�<���\/����Y��R"f�7;�%:�WZ��Fy��oATp����$
*�s"�oA�qL�����o����fq��N��n�bkW����2���yBV��}���b�����GGLp'�C���]��A'7����q�����w���qT�����������������{sm9����;h/��Y,��oW�&�|\�����,��H'�YH��F#��Ns<_��t�M��9�w���#���1�ru���I����h�������:K���.2Y-�f��{g�~��5�]��}���V���n}��E����:)�����,\��t���w�l�<W����%y��K�x�����5�QWS��$��p�����s4��Q�1V!���m����SfD�fO&���l��t���}
�H���2~T���{e�y�S�0
�s�s�>;��WF�P�pBs�� Nr��e���IQ �q�I!��I6�A���!�D�nU����
	��U@�d��6b��
����Oq���T��Ur�Gh������ky|j���&�u��}�����$���������6V���-{�}1,��8}�o������+4 ������[�!}_%�h~�$: �����b�=Rg�Z[_��a���
m	9�j!DN�@�p�N��\<�#!�WA<�y�����*�l��%��2!����Wu<�\h����B�]9�(��J�2m��)��R�R�j�*��Z�r���D�)
R�� �t,q��=/���r���A�I�n��A���[��_�u9G���{�|��O�e�!\����B�v48�*�@F)���W��<N��L�?�V&����&������s�p���BG��FxD�<�7?��"���=C_�E�{��b3f���;�^����9C�^�>9t����k@<�t���Zr�)��n[��Tp�w�qI�'}��B���]E�)�\p�U"B�y��BD�~�1U'r�\z,�'�5&g�B�4�UAa��n3r'�muU���,IdP�E�����qBk�-��MG�t8*���/��&�q�9'�u�y'�y��XX�db�?���!#Dt�u�k����*�&�o�c3:�-���Pp���AX�%��Z�&I{.T��6�
�*�E�u�U�@�-�d�@8b�XjF�W���-��(,�K�
a�B?i���f�jL��hjS�>���
.�S�AT�P��
���G��m�d!�(K�*EX�'^����[,K���%?�
&�^���'
��Tk��"L�H\W�
�)����1��t*�.)�eN��G���k�������q�:��3�E}4�I+�4�_��*2���n-t�\���GxC<�oK{l	DI.a�R��nj�Q���K��+_�[�)�h��f�Hs}b:��,���X��2E��e���r�����2~"��A��A�6��
�A��&�DX���M�Kq"u�.9X�
��s���n�oM��i�R�TyK�j�y)��n���:�yK����jx��c.���g+i��;�J

2�{c^���v�J��	��u��*��-/���������f0�b5��5-)�����M��� 3��
r��OY�#�p<8�}^��`�=��%��C
����u�"�f�N�A�2��	+�	<���A�o����'{���^���$�*�!S^�C)V��������G��#��#^��Wc��p�
�D��B8J�;�I�
�x��N�i��B����|���X�nUq�2�� 	����V�'b�I���3��MYG;�(	���%3�A������%.s��]�����%0�)L�`����'���r���`���>*��.��3�3���'�K��37H�p����<':���u������'��Lw-�[�����x*���������I� �� 8�'�Q�����B#*��R���(F3���/���h����@N9 D8�����P.���<�P�^�aL��1i������>�)P�j���r�,�UD��t�L�:h�~PT�560�'�^���"u)����������f=+�Z��V�No���Zy6����.n%Sa�V���L|��ph��j��=,b���*��xZ,d#kX���8'�@�����,hC+�������*i_�T����^��hZ�dN_
�Z��QpBq��z('6-���]S���2���}.t�����V+����.���L�����-��[���vn��[\�!D��}/|�+���7���k�K?�Rs*�^��sM��7�]�����\8��~0�#��9x�w�0�3,o��Q����[����&>1�S�RP�r�D��Y
LP��*���k����>�1��,�!��F>2����%3��N~2��,�)S��V�2����-s��^�2��,�1���f>3����5���n~3��,�9���v�3����=���~�3�-�A���>4���E3���~4�#-�IS����4�3��Ms����4�C-�Q����>5�S��U����~5�c-�Y�����5�s��]�����5��-�a���>6����e3���~6��-�iS����6�����ms����6��-�q����>7����u����~7��-�y�����7����}�����7�.����?8����3��8�#.��S���8�3���s���8�C.�����&?9�S������.9�c.�����6�9�s������>�9��.����F?:�����3��N:��.��S��V�:�����s��^�:��.�����f?;��������n;��.�����v�;��������~�;�/�����?<�Z!��3����#/�y�����<�Kp�%e8��c��x�p�Qx�`� ��O�)a�V��vp/�K1��"��v�<������ ��G=�m������=�o���c>��=��O������6�_D\�����'����>�����U���#�����)��`'B��w����-�������	�����_�	��E�J���_	�@f�n�����	[���l �u��}������r�	J\B�F\�Z+���}BF�>\
B�
��R�����	��y���`����� ��� ���� �}Bx|(��9�`���
!�Q��a��!.a
6�>!��!R�� f�R �+L�p���I\^�J�
"!�@��0b�!�:�&)E��@L@,1�@*�b�H!�_T��:�'6\(�Y�0#1�T��0�@L�@(c�@T�B
��	,��5�b!$�
��M�2�cQ��8����0VT����AB�.6\/�8�Y#��?�����LD�0^@K��0B@����0J�S����`n�v������"���0�WL��`�:
#�pB2@:�B�p�BX��<>\=2�G*H�����02�I
#TD0@��K.�Sl2`6j�#�d��d���1$@RH�0F�B
���cK\�0�eS��N:\O.W�^>�Y(%R��3���#c�`�0�c.�`FZ�6>"
F�7Nb�I�`"�c�(�c,�c��0��R�"	�e�����������iP\��P�AA�_*DS
�K�$0#S��^d�"��f����y%���0r#|
X�K�#@�B�%�Kh�R.'n�s�!v_u�f��&;&$S��K��y.��K�%L6E`��''�'�����0��Fh�hB�y�eKX�[.�x�ex�eS���!�n��:!���L�K&8��K�#�B�dL��J(S(���@+�&�I�g�Y��'����rV�}���0�h[��B��:
��*E
����@������)�����)�����)�����)�����)�����)�����)���*���*�&��.*�6��>*�F��N*�V��^*��iQ��p�dyn�Khh*D��������H���d*�����*�����*�����*�����*���+�+�n���%�����K��B��K��z�
������v��~+�����+�����+�����+��+����b
c��Z��x���Z)Kh�4b9�*B�d
,���,�&��.,�6��>,�F��N,�JlR�f��n,�v��~,������%���,�����,��,�f����,�����l��^	V�u���p��f�cc��K�&�&T����"|��|�u���g*b
�g
�K�B��iBJbF��!VN�VuJ�������K4%��zmE`�������@
�+�D�����#7V�7&)f��
��V�Z�'D@+K4�T����J�E�����d�!��a���@��0�&p�+���k�0��f�:��j�d.e����n��0N��A`��>����L�$eX�P)<�����������./�.(�n����%�x����,M�	��n&��$�zl�����:m�B���'P�8���&;mK�+;*��Z��b�]~���n�����B�$1@<�0ZRd{�c1��.-s��&�#�����$nZ@�������S����	�!
C���)��0>R���/�.� ���hV�����U�U�p���OF1PN1�qq�i������-�Mm�0���=q�qW������(�����1��q��������qq?- Fm�����	�k/K�+q�I��Q�
[rcrkq$@`XX_+{���&�%`@d�`c wn%r���!��I�����1/�/�0�0�Z��t���+wd,��,��`�.'�'1(�e6��6����  �;�1��(K�9��E�]�r;��;�f9��<�sb`	�">W3�]��53�=3��B:c�(��A�Boe??�?�<��P3#w�#2$�U��7�s2��2��B�C�Z�X1�b�$�r���@[eA���t0�4���@w�G�o#�o��3��D�4��3��m����S#5H+5���%��Q�qR��������q[5��<f�Xo5Y� "Z��Z�4�E3��M��M_&N��P�	�2\��X�u�E���!�r�U��^�Hg�t$3�x@
���_r`#�`�'�t>O�>O�e5f��i#\8��8�s<��k�Zl�l�sm��m�qn��K6�Q���T�������j����,�vswh��h��u��s�^J��J�[K��K_�"��b�5c�!v�5`��`��{'6^/�^�wt�jO�T/4U�w���!w!+��3�w�wOk�Os����������S�6w7��!xv+�vO0���������z�7{�wY?���@D��!�/+�K��%��������3���7��Z����'�C���s���"h�����q������U��C`����ct���4��w|��|Sbs��.�96��e�����WS�4��J�x&�����5�8|��|���!�4�x�����,T�����9�����Y���yt�yBy���NQ��@*@H@��s�������7���Z9��}�8Y1�������x�[y�z�[���}*���Y)����Gw�������W�x�[F�P�XD����0@L�V���Sc�v':w�%��SJ�k��L�������g�sz�k��w�����P�R��T@����[��������R��''�@���Z<�<�����xW=�{��@2qZM���@����Wv�/���S|S���Y����I��D�[��4�{{���(����z�����s��fE=P]<�!�$�7���<==e������r���Q���u���Y��L��B:�q4�����|Z�=L�B��ns��y�������P��d~��dh��{>lg�+W��dN��W>���F{��}OQc{�v���Kw��<�3�D����#��V��������~�#'��n��0?�C�\��<�S�
��5�|r����;)��{�{�{���}�W���ANj�As�{���>F]@�Z)B{���g:@�8�`A�&T��aC�!F�8�bE�1f���`);v�p9�dI�'Q�T�����X�����@f����F	�?��`��hQ�G�&�I���O�F�:���@<KU���kW�=��:�lY�J]��y�+�8��{QMP�8������A�v��<�p��v�V��q��!G�<9e���g��9��-v���<�t���V������a�^�Xvm�}-��@�W��r
��([���8��"���E�zu���g����v��Q��p��=��1��1����?��t���/���[���>��el���7�����	a������
���k#0�W������Xl�E�@,�h
i<A�q,��yM����� �\LD�0yK�%�' ���5��r���r�����!���#�K��3�z�����
1�D
�8����:��
�<�D�L�8�@�A	-��!�d�M�v�����|T�����R���TS���T>eOI�(�RQ���TY�(�Vau��X�c#���5�Su���U}
���`���f=��Q�\�=�%��hc��V_���d�U�)?I�>���i�-��t=��]O��o!~��� ����m�%��v��]G���u
����'��Rz@��(�H,h�a�O�P��r����E�d>>�O����<4m��J+htC�IG��N�u�3���dY�9�YI���A�&���-��.��K���D�� b�Uly�!&(�&!������}��9z������u�����R������"���"!<
����VAP���8qI�^h���4����z�W����u
����[�����eD�����f���0���1d�r��qt�c��C���&	��&����-�7[������DH{����:zx�+����/���7$=R�z������1�6��\8%�!���J���� ���^cL������h$0��+d�]��0@��l"����l��~��9����p���A�[�W�V��F �!�Zx���P�#Vg���l
Y���������A��D3 �A4�	 $�s��{ �:0^(����
��&R>6�o����Q�,Ht������x�z��R�4!Dx6��E���0�A�d%x���,R��qdu��A��QMcA�N�I7&$�9L}@��`Ae-�%��re5�KlR��������� `�)�"����|#AJ	t�� ������"������k:�$�n�yP���:�$@L��<|c*�;��8�����!&��zfd
!J
����T��1hKc�P�����Y�[p	P`�$�6�`!d��Bn�G[�
��3b
p�W�,f*�f�5/��id�U����F�a3���wzTr
1'H	�Q�H����~��\��;[�kk�����U��!kuL��� �zQkB�j�|���+���56�"{C*��<����5�iQ�Z�����u�ka[�������mq�[�������z\����5�q��\�����u�s�]�N���=H��]�n����np���
�@n�@ �	�`��QrS�&����S3�G��pv"x���^��6���`/��
v��!a	O�����1�a
o�����Ab���%6��Q�b���-v��ac���&���k���!��k]���JD4A�k�d%/��Mv���e)O��U�����e-o��]���oL���!ES	G�����H�F���u�����g=���}���hA���6���hE/���N�iIO������1�iMc�0����AjQ����6��1A�S����v��a��G�Y�vc��&�d[�8!y�'E��?�6~%�d�z��V��9��l=�x�� yt�@���H�����l�[��Iv��lt�����@�	�
Db��2�-�W��q�������[��9w���n�����y��0r���1[��
�L�C�m���UJ6[�h~��\�rQx����������p�����2B��������E���|2)g�YX�t��\�c�A��e*);?K��������1j8H�U ���	�������+Q�;W�~w�����������<6!�IP�Vy����0�����z���,��g~*y��T�^� o���[$��=�������R�a���A��O��-"�6���|�{���y�+���G��Gd6�4��c/��s���q?�����+i�Q�R0l�����G���_����(��
��Oo������~�-( ���fI")#�m8V ���,�/w�(p&��g��`#���*����,���~���`�"��H!0�$�%������4�,&�c��S�h��Y��Jo��"����d�hd���+~�
QB��$2p9����6��� �d�rD�h���p+��
GB�p#�p9	WC�>�����/H��A���
	1#��-��"�P5�P ro	���/Gp8����9�"���EQ*�4^�z`���,l>J��dF����l�:+��yQV�0�$��,i�xd��~�(�<��c��(|q�)��&�kBs�(q�f Z�Q&��E�1'�q	�U�.�r���L���hd��1%��m!�W��`v.`��BP�%���	�
2��B��\a�8&�#*�/�3�7r��BL@ F@K��$1�$�1%�q%��%aJ� DA\��p2'q1)S�'��'�([�' ��4��)+B'��)�*�Q*G���^z�,y�,�R-�C�V|"�J+%�+��+��,�Q,)c|���G�.����.��0I/�Q/��/%��0�G2�V$D1�.��1��1�2#��z��4Q35O��z�Y�'2S3(1es#<�A�E�1:N��D^�'�6g0�3��81"7q7C��NH.T�R9�3��6�9y�9���O��cT�=��� ����3��0���L*(&\�c=��6���s@]�
���1?��>�� <�P@9�h�=N����3�����0��{$�P3P�.P .�'"2A?�C�G�K�0"G&�AOS(�8�4(��@k�o�sTwt0@���47 8c�F�C��@��!���t0��|�KT��0�I��K�1�t�bN!"L�(��AsDE1 8K�M4IsqIE�K9O���(�&��<@PK�
�PmQ?QQ�Q��Q�O�~��T�R!��`�,�-)B(�'��M��NBS9�S�S�b� X�uX�uXE�2�M�� �V!Y��$B�t8��=:SKQW�W�Wi��!�����0���$��<��$�VcWB[�[	�[c�l @��_�5���*���3�Z��i>�
@+�`.��V��!������UN�&~4!(�l�����!�4N`o��K�U+�/v3�Ux������^�e�S"�28�^'�!Z�_c6U�
Bb� ��f)TfU=�se?qh��h��h�"���J�_�vl�u#�.q!j�&Xs ��&`�!�mV�'b��5h�QN�6to
�k�";T)ST��V!��L��p�_���
�M�k�Pk�p�Bp9�Q��q��-0z�`��c�h��/�0�f�R5�n��q��v1w5�*��_��l��l'��@�&������.B��Mh�v�r�0w�pw5��`&d�k���"��dC�V��z��c�{�P{�B!����_��k���J�F"�`VK` vr%}�z��}��}�����4$F�V�X�+|B�*��wQL��2�Ex�I��M��Q8�Ux�Y��]��a8�ex�i��m��q��r��}���8��x����������x��������8�Q8��������8����wX��u<�)qW�.`�8�n�.Ki�U"��Vh�������8��x�������9�y�	��
����L����Csh6b�+����#������=��A9�Ey�I��M��Q9�Uy�L������;��5�,!���"��!�����9��y��������9���y��������9��y��������9���������9��y�����������y��������9��9-�B�����9��y���0�@��<�v����R��^� 4gg�-ACr��j9�����P��b���:�.5xcYB+���z%�)-1�+~`k(��h��m:�����l"�^c�����"zv8�oOZS�
W�[�*���@W/bQ}��fmR"���!Pv8�)wo���Z��*�O%#��@� �~� 6�"�=6�v���z���z*���%c�$���/���
x�,"�A�##�����	���0�)p��� l���&��^A�v���X��,B
��
$b� �d�4��P�{��#�������x@������&,��/ {"�PCI�!P((p�]S����5�����$�`&�����+���bvBV�S6����;���O�`���q�@~������bb�:(���o�������O� |���n�����`d#B��Cz�������0�#p������nz��}�Di�J�����������O����hI
y�$F�B4�t��i�����������c����\�A�T!�{8������v��o	���B�'&@O#`2��J�F�!F�]�����kW����������t��ma&��s�Lg(� �R���R��P��������"QT!����D�w�?
b �'N@��'�����?�w�}�*!,�J���q!���v�jW\�������9�uB�@�m��G��MT�	��@����	X����3��s��(Q!��H��V]e�W�������}&H�
��0�J||QL| �
�c���Z�
t�t�
��E����]��]&����)f�������m&�nr"3�
7��:��>^&N$���e�H��]�������Ng>�j��n>&�`�����GP<T]	��K2��0�-o����I�/�'@�I���~DK����U��!��������N����G"s?7R��Fb�*	sT���W�������E�=�I�zC(���FR@���.���~N�� ��$�#(���J�v`T �@
�@,4(N��6��������.��n�7�p��A���@������y��2��+����a��
�0c �A������������[N��gu|AC��.n'��;�l<�0���:|1���+Z��1���;z����@< �K�<�2���,[���g ��4k���3�CW(J��	4�P�l>�je�-B)��P�3���T�ZMHi���\�z�
��@�a��=�6mE�vH�}7���1����7���p����s��El�����U,V���7���Q���J9���q��:���l��>�:���wU�~]r��A����[�e��{��������K?��2���ac�
�
���[������b~�>�M����_-��z�\�C'�>���������������rHS"�����
�F������Z���&'��b����a�!�lH�d
��"l��"j�ch��J��������u6����@���D)��H.����	Av09�~���O2�He�,	�e�])f�T)if�:9_�IA|g�YB���zf��~�D�����������z�f�DiZ�9�9["�^�P��n�Q��~JQ���
Q������)�?���"�B'��jjk�
y�k��������W��*X0D�GB��@��0��C��c�a�����v;*���:�~{������Im���������z����~�/��|i���2�$���8�l*�	\��A�&
�K�����,��$�i��{���������Y�P_;f{r� ����<�i��i���,��-�1���W�B������lu�Ag�%�b��J�x�l�:/`lp���lw���Kn-7�^[���f�8�l��}�����@����w7>���XC:��y���w�"������i��%m*p�P�Un:��^���k(����{XkD7C�&H���n���'�`��/���\��:C��6F���C7lq���!/>����������R��	Y+�	����)B���F����|$�H
H��!�&jp���g�X�
{�����h)	Y�����r&�"\Kh���<
\�JD��dP�	
bK%�^B�B���p/$bxNh���0��i!A"���ZW
R6�p� 2����(��"��:H#s�hF�81���lN 
W���Yp�2��.�1vL
��1�8e�p�hH��1��m�e�>����"���E�
�!#�"�O�����
"Ky�E��!�������!!l����+���kH�K0��Q%1�86(B#�!`Tp<*����llsL��2��&79s�o����)*0	@`��l�:���UQK������("�B1���y�mT3��d�iP��gAB`T��&�����_��������c	.�P�T-��)(I���������D_
�����(P���G��*u�@{��.)j\X��4�]��S�
U�F*E���O���Q���'��QQ����E�b=KQ���"g�����4yT(RS�&)uE��$�z5Y����,j=���,/�b��%�����w��`9���z�����`;{���o�A�~��B��C��B��a��,m����F�����h���7�LY��R��Q�|����{[�BW'����x�i���V8@x����?�CYuf+�`\�]+A�OT�/'�dv"������m	o����2�M(S\�\��h('�"V ��$��Y������F�;��&o�*���d��	*���ZV�����@����$��G>�cD-�@�`T�F%~�H0�����3!�`�����b�����Y��e��@v@$�!�a ln�8#W������� �Bd�Q�A�{�Dx5�:�Fzl���9��qo0A8���)�\�������21��a�>YpFh-3�;dY5W�j�8:7��4D-��C4�
��E:I���b��u�cMD#"�^v����Z�:Y�f%���H�/�<�(�0�5��=��!��s���^����x�{���wbat�l{��4�+��������fH��v#�&�F����f��C��F������G��;��������>��Wd�0HX;-����"������<����ON D��B�5�Kp���dx	�mi.��#<�"��!a����l��	N �5�BkX� F�
.��V�W"��7t�#�{��U]s@u9�/�z��n�=��,�*`x�	�q
�
�	����&���$���'��z���p���P'�Oz@]��.-h�V4�RA��(4�m3�kPCp�DX��#���v���w��]��/(��=��,PK���>�hZ=:I"j��?��O���_���������"��������2�
[[( MAj�QW1�']F�Wp�wn��n���(�ToR�Nf�#�;�{��q�?�����\�SQw�r��w|��l�n8l�&BpJEF�0-2�%G���
��?�M	A��H��
lAX��Z0	hD�n<�l>xn@(�'q%�.#��I�u ���d7���2�*`UX�pCO�X/��:�j�ll�ln��uB�]	�|�V(r�%�����j�Z�/�
�y��Heq}?��x��w�&�*������1���oR	!�"#�` �2�t�;��C��
A)C�9
1
�E��d��c{���p�(dv� ����
or
�(�h#��cs�E)3���\�m\�g��G	�J~�������W����k��l����	A
���#�F��!�U+�~!���'�)p���fh496a�qe������?6�O)f�H� C�&x��&H�(����>G_�0
�0���(�D�?� �u%8�xq��	�')��J��Ryh���!��F���
n"pv�hZ#�&.����,_Tj		��%�^P��gmFi}���6Dh�n �j��j�����gV	p�@b`/	�
!
9�`h��iA��
x	uiq��#n�j9i�3h|���������(`�`f�T�&��o�y?��'`��Y7�CE!^1�8$�g�����(7fF�yn��h��h��&	�i�*1����P8-X���PI!�PQC>
q�>�#����3��`��@�����	�9��h�C��i�,��m�Q�C�u���s *�p�@�g�z�5�o����A�B�L�����������Qu�Bn2�����!�@18T(��0g��Z��o��f3�^�`(���Hi�lkjhmfo��$0�|(.J}� �5Wh��
����`[:!���B��l�f��e�z�p�z��� ��i�q����T�I�:�10�8�����j��e�:s0��u��Kj^�p`M��i�A���QBa�C�N�����H�k��e��e��� �P
a�����Y�
'P�6������7�+`��
U��F%��F�[f�A��z�*����&���r�[��VEWA��I�_u����X��>��x��@<��"�D��3��E�����<��!�&p�b5��V�Av�;��r�
55���b�1/��d���~q��k�l>k�A2��C��6�>V�8v�qa�����S��	!
��Q8��r}ZB	�9��
��p��`�����c�b�>Q!�q��i:�iW�1|
�N����[��+N�kh��c��a���(����������� ��������A9y��K�f�b��a��������b�!
��V��4�� "|!8�
�����^�����iA��u@`�� �Q
?@l���F���z�[�$��[������ie"����n��
���(�zA�����!�>a�Z:{	P1�e3|a5�_7�H:���&p{���A���RX9�Tk�0��q���X�p����%k9NgPVdX`Z�_\|A�(�
1Z	������n{C�
��41�I�MJ(|�Jv`���7�|����M��c��_�|_�|��`zD����u�����Pj����������r77z�E_���Y�M��c��_��]�l���0ed����I�b�b��.��n,�u�n~�\�������:�R��c�|_�l]�|B�TE=��)�*�����4��j�����O'P�77@�k�P�%���%`�l���,L
b
�]=]��@0��kJ�r{���h,m@�a�o�#��`�76�?q�
��W�X([ qb�0����>m]@
]Bm&��*=lQ���&�����|b�QeB�`��+Z����'��&��E��e�eB��|��(af<���-�U��6���O��B��U��p��
]��[�]&�
�����Z}l
\2�%p��P������[{�@�������������{��&���i����z�1Pj�'����9�	�����z���@�u����������M[��'��k�
!��	���{p�����p�qN���������p�%P%�1N	S0~ �%��Z�p�=��?�A.�CN�En�G��I��u�J��O�Q.�SN�Un�W��F�f{��]��_�a.�cN�B^vQ�i��k��m��b>\��� �P���������f��u��# �nV"�.�f`@BAup��~���/@��n�~�������.��N���n����������������.��N��n����������������.��N��n����������V�L���3��o������c!�4F��E0M��F�#04����9�.���&�n���������������/�O�o���	�����5[��&��z~��|����>�/�#����Q-@�)��00R��/�1��~�@aQ2��5_��?�A/�CO�Eo�G��I��J?x���O�Q/�SO�Uo�W��Y�[��]��_�a/�cO�D��e��i��k��m�M?�t.�0�������O������I��=��)���]7Qgl :�|��8��
�	N����/n�������8s�����a�����Y��q`2 X��	���
���9�����[Z�_U����	������B�ob��?�/g�R�jpX�A���������q���A��e�SA��HS`a���{o����
�Bl
$X�����,D���C[�JL�Xq��V5n|����j 
�h�+��f)�eJ�1e��Y��M�9cR�cg�N�A�%Z��Q�I�:����R�Q�N�Z�����z�����W�a�*����X�9���mrb+!��l
8,W �!q1�T����%�Xu%#q�-q<.�x��1'(�����g��m��)��i��Ulj��j��e����W��u��
�������`[�`^���= |&��!�����^����#MAyb��\�z��Fb�]�/�b�s��I��_�����_�����R�p�o@��������
����$���b���0	�-�[(B�b��,�rE3�p�h6L,��I�
�n�p�����G 
�/H"��/@#�Tr7�����e��.0��&�C�V!@���<�-��I;�t8q��9��Y�%��$�n'���G=��3?��tP��$�PD�j2������k�(`��-�{H���J�( � �B3)�8N���Q�����x�=��81EF�3W^{�iH_�v#C�5����B��U>�TJ�`�4@@�|����JSGTj�A�q��-h-j�7_
G�t\��]���W`���b�X�E����n1���0S7bo���NM���(K���RXU�Y�X�;���D�1_�c^�_�k.2`�s��`}K�[^!�g��{y�F/����Z(��re�V���Llys2O��0j����?��V�@��v�?����>����d�n'�� 	��v���B����"6��Lk�������'����/����7�����d�!$��	��<p�a�p���Uk,P��N�a�r��+>��r��-s����s��+t�f]�ra�����.�-$���P�Q��K����S#>}��B�}�kC:~�����c��)���^��Q&q�HA�,n��i��@�}��M�WA��~d��nB��m(�{Lz���������@��!�a
rA��� �a�<h����1����#6[11#��p`�D��1��c��`��t[�b��EJ���5a�����V�����hE�<���+XE;V���D���Go@mc��'N9����@iEKr����M���l�X������&���^����� �(.���1'�#'r�n�J"H���$���j.�K�i����0�J^q@�`�h�4��I�Q�����T�+���H�"��Oh�1���*�i�������<�WL{6�����x`���B����vMl�s")�L�����N
�"��3��h���Q�����g�8��� ���,���(b
X��`2��UD��	b�#�T%�qH!�����&U��>������+)�(F�o��KQ���$F��IB��"!!�Z+r�wN� L����Z��I�o�*������-��a�B��&�K�l�6�A-$
bK�����{U�]9{6�~�l}=T?��D��$��YKBV�(=S�.)�X�nV�6��nkZ�����bT��W��a���X��P2�Hr9q� ��$5�=9mp/�[����p��Ri�Q.e�a���I���{L
cj��
�EcK����������5^������^���W4��L�b����&�@�ER�]�����5�)2�
8��*����`#k���$��.mU{�ps�"[�� �)Qcw��X�|�&FS���5PQ^��,��	���J)�i D�M��(�c��B
[X�"m2WZ��	+e����\��S�s���g^e�O[��
`�4�[�Y�������*f��BisW����9�?Q���+*@J������n����HD�aB�=��A�����`Xp��6-jD���
��	����A�����9Sel���0]��z;`�L�H<�%9����]�d��Pv���]�?5�z"�����T�=BV{���!1��IP)�2��
*@��� ����:���to���I�!WR�I."��gvG2GF����� �V��?38� Ws�Py=�+��'W��n��=H&WzM�,���i�c�xW$�dC9ck�|��:��L��;�M��������BL�;L��Y=>�(Mf�����*-����fq�������6OC����^y��������8���5i���>+�L�����!Y���"�������7>�Zd�j���uy��G���O��D�(�-=��(��A�ai����f�>�k�a�c\�W�H�8a^�9�@f8L��y����f>����X>��S!����>[x��6Aj�/�z6�;�H�����3
�;�b�L3��(�]���@��4��i�>)@#�2?R�0��j�	x�?�������@�L��z�	|����J{���A��?*�B&��},� �GQ�9�����?����6l�6T����U���;�0��B����|�9����E����8����cAE$�,l���A"1C
�1[h0�C[`)(>%��i9
9\��p8��C�����+@
�J��YxE�#�O������D�D"�B_��G���D ��L�D[��L���R-���E�DB��@
7xE����x�
L��['�X6P7P=)�Eb�`|��Fy<
c�P��K�>��D��JT5�����Y���Q���`����ZE"�6w�G!aD�$z�H��G�	��XFfQ�?�1�!�
l��#��{�C������p������)����G����I���).lI�`?�����=`�a��3Z��,���BZ�-�8���hB��K�����I�L��\K� J�'�AJ�p���G[�	 �����	3����F}y��
S�����:+�9]�H��B����hK��
�TGa�����H5?A�@o	f�	�Hx�B�,
oD1� �������E`�Z<���L@q
��\N��{i�p@��GP4��B�%HA9���EP�G�P����Y(�1��]�
W�M��H��L�����O��L����T�����K<�����z�W��SK@�X�R��=�
�P�P�P
�P�P�P-9�-.Q�P��C8QmQ}QM��Q�Q�P��Q�Q�Q 
R!R"-R#�����#]R&mR'}R(�R)�PL��)�R,�R-�R.5�����N�1��~�� ���\������g	�����	;�S<�S=�S>�S?�S@��0�J��P�@=�<m�`DmTG}THuT8#�S:x�0p�#�TO�TP
UQUR-US=UTMUU]UVmUW}UX�UY�UZ�U[�U\�U]�U^�U_�SKD;�6R�<y�9�l}��:��W}�V}�U
i�U8'�S&�%��m-Ws=WtMWu]WvmWw}Wx�Wy�Wz�W{��`��i��1ii
����-X�=X�MX�]X�mX�}X�%����X�-����X��X��X�5�n�(hX1�XY�]Y�M�;�mY��Y��Y��Y��Y��Y�
Y;�Y��Y�
Z�Z�-���Y�MZ�]Z�mZ���5�|E�W�t&pJai� G��Qx.��5�����{��m�U�E9���
��[��O�M���R�� � O�ny0�����!��YB�8	�h�����������\��[�-��u1n��������T\��X<��,	�[��\��\�E
�����\���!���������C�T�O�6��A���H�H���L��\�]
���^��]��	��+K�0\��04�M��E�Pb���{(��H'��A���^kj|�������^
�=`
�N���5;�
C��
����
E�����ps	N�XB@E�p�9�����U`�0`�������^��g	��X	���	��	;O��-�����T!�|+�p.x�X��
��h�� .��A���xa���,��H`4�<��
�������G`�����	Q0�8�����@U��VP���N�YH]�X��X�Z�]��]���aM�	5�d�h��p�X��8��H)Ay���
�	�6c=��F&�������7�����Y��S���B\�������dg��O�f���g��~��S=)g����eVS`�`�F^��h������69��Q�5X�c
��:&cj��3���f��k�����{�#2�G?(o
\!n�"���t�^n<�P���
�|^�@�[&�QH�D��h�P��H������
h�i�0h�N��p���<)g� p�^�U �98��5,�-�����_���!�b��d�pg������`\��K������f�i��i��h������_(��W�98�I��.	���@<���L�A�g��j(>�Y�L�c�h�	k�`���/ff��F
�~��������2VE#��2��:����	x.��������+��Pl�0^�9�Y��.������sl��������m��.�h>m����p��hSS>(T���)$�����y��P,�8������>����	�M��@��m��������n�h/
�vf�v�,����D��6���	Q��#3�A�o��xO�	������O`.P�H����r|�������s��=�F�^p��n�npP~p�`���i���z�������Ep�1p����V�������7�hv���j+�l�h��o��l���7����Hv:�����&�N6����	VC1�3P�����Vx�]�
�����������	6�����h�~�����p�`n?��8����s�����sM�s�Q���c��0������������;n�8�Mo��	QX�-��4��`������q�
�QX�PA��E�:XkU7
V'hW_cX�
%n����V6�\w�4����Hk����}�Ht���7�j����@���h|��O�18�p_u���(w4>��hVo���0�|m�`����
���e��~\y�t�L7`B��x������eP_�d^��/
[r]L�x3VN�?
�/c���:���`� _�N9E��AHv��ph���[�X����AP�0���-����?!�1]�?��cN?����Nz^��	�������������u����!������ ��fO
���5�V�.��tW������LO8�����O���`���m��{��{j���{���!��B�����WP���c!�^Q���h. v��}[�`:�_lAp�����+��p�E��/�ApA�8o\}�O	��f��a���!�]�m�8��*�����`s�(!0���"L�p��O*B�(q�
Q
�Lh�!���d<1�#��&��b"��Ge�8�$��6o���s'��>
*T!;vJ
M�t)��N�B����Q<R�b��u+��:���u,��f�R2����E1!b�s�'W	O���i��yc���J���n��y{��\[�bdL�����%f6��y3��6��,z4��l����jz5������#�5Y	v���0����kW_y���G�5c9"D��](#/�������Y�����L|<���A#5�~=���U��/�����3
`���|��������CF1�2_+����_�qA��-�!���"�ay�@��'�hz)����������7cO����T
�9���W��,j��	[D7��15��+�A$^Gn�aOnX6Kcj��QA��7��C���j+���������s�U#�&��L0�`MsT��~��)T��)2�"^�[;45%�bzT�_{��_&d6"�R&D�Z��P�y���'��������G9�(�@���@P �A�N���+��d�R),��`%A�X�<}�iN�2��?y*B�X��@,�����&�*����*���:��B�+� \`A �����
�'!��R�0
�<�[:L[��1iT��jSIB��@q��v	���t����3�.�7�)/�;�i/�Ap��C@�����Q�I�42��7E�[WdH�+�e����[��C��%��|�2����	Y@�"�D�*D�Q23x�6^8~:��|>�kG}����AK[NZ�Ux�(
J��
�!�k/��2DiKb�prA����uK��Ol����g<{�_�k���j����@@D`������-�CD+ �S:���w{�QF�v�	�����Kt�^�pGe�@)7��'�o1��dB0��A�x�9�f���`YFA�A�.*�yJ27���A[�w���;�C8
��+��!
�����)���&�C��:8Q�p������q����:1�<�����ZqU>��z$����kF\��D.�'PL#B�6��Va�#��HG�|�2,<����b�������B����L#�&;:�4������G�x�+�lM+����@
J���I�**��Rl�*c�Vr��o1�(;��K^�+�\�"���)2�0y��b����<�+�����'V�eF�0�]��z���9�z���'�IN�4��^�E��:�h��9A�t	���$;�41���q�����y�I��$7��B��Vx/t�4� |i��M
�06Gs��������H���o�(��AW�W�3"20�Aa���|I�������l��h�����7$M��N�������i�TkQ�|
*W���8z��l�/��h�J��9��I��Z;C�w}Bp��R�ae����9@���/�l+g�J�X����4c�b�Vu%B�%H�l$��d�)� �_nP���f�a_]A������K�j����rT-
�O3��y�-g��(�068�3'��cFK$�*�qp� �[A|�����_T05U��b�v���U�B�@PP HS�ci���84F��K|9�s	$�>	k^�z��&w�'��sOr��t����H
2x��v�]��p��k��Jh����X�-��,,S���T"x5�e��I�E��A�"�8�!
��EW�������s�� L�dM����p����
����0��
�t19�i
b���'���
>`����*��(��+4(n��C|WSY��J��Y�d:qBX9:@x��T h���GeVM�-�t32�W�=�B^\���oA�BD�������I��Y1]VS���*>��T�pU������'[�}�-?N%r�Y'��<i�,'�l3K�'Z�I��QYy8V�VKU�\�
��n���	\�#A���hV�Z�"�N��U<J�Q	�7�eK������A�tD:��Qi�M4j^��s��P����v��y��$�}l�8&�Ro���5LtkF��I���+�
8XA�p����'�����%?����}���q��i�Pw�������#{�QL����e��S�-S�,X���J�w���L�n��A���;HA
p���_��Sw��������h�G���2�C���K5+OD
CyoF(tk�:�d�_���D�a
�h��*=��o���Y�Q�Z�����������{!���_8W��/(����Dn����4��D������	�]p)��b�'(�!O{eY1Q��"��I���Q����]�$B�Q��8WH��l�e�@�����ii���+l��a�%-C��D(`	����Y����������_�H����eC�B������`I<_�PT��%\���J����X��%�'��D�\���@
��T~(��Y����R�`��r���`��G���B8�����}�r��i�R��[� THRG!f��H�d��,]L��Vl!���|���p�@��p%�AP��@pb'rb�0��M]�MDe%Em���- ����
����	X� "��eDz�����	����^"�[U�����`�)��@��"��\F�SDZH`-v���`A��(�Y��D�@�Dnq�N���
`.������B"���$[L�#Sb1�"��z,��$�2d3
�3BDM�|#O��D$]G�����G0�V��XA�Y_GX���(�SD��&�N�4*�<IK�LIv
Y�c@�@�dy��`���OeP
eP�3�`N"D"l��\dP��e8\G�b	��>+��<V	N<��=dA����At�ad,���S�f�)rEM"ey�d]��N"�B��K�Y�y�/F�g�DF���Y[N4�D�_G�A���a��b(�d2J��+���<�)B���P0����ei��^�$_���D�@L��EWM&k�������E����A��`F��A�ZL$!S����|�d�\�fk��iJgi��M�fu�h��G��D4fI�&k��Y�gv�JN�DZJ*��DB�6\�V��+��A�#V��v�u
�g\g@fg���b��G��DL�I�gk���	��u���e�]�@p�b �e��S����O�������f�?&��R�_��B|gD�'IL���D���F�(��H�g�<�����(�.1:i��h1�h�*�zF�s"��Naj��|h�	�(C�g��B��@r�D��R< �2itZ)}����E �AD��`���j�j�x�Q�����V�D����F�H&����p�B��B�v�R�hh�ih*����n�"���,��JA���a�T����(d�	�gA�d58� ����
�h2�+!E�L���N��D�j���MBi��b�[����j����*���(<``�B4'9���|����D��y�kj�G@j�*����o������+y�j�:��rf��*����2l��T@���'(L��a"��,B,p��jB�\�l�G�����'�������(\a��D"\U
�RTLS�,k��#K`�v�Vl�*������@���%���!�M~"(��^�.��Y{z�'�-��F�J���-�^���m���p2�LQJ�A���-j������mf8��8
���H*i����l�P�m�D��-T�m�f�ep-�\��mAp)�:�_B0�F���(����ln�h�D���,�������Ngq��H|������eU�@���ym�B����]��j���Z/iD�LNAv�B������a���JC��2
:��Bk�����M�il�����������������#E�D����D� CP.��!��a�o���
.�(^��/B���j����T����f�+��)\�AwF�)�2
�
4�Z]0�w�@���I���A�*�*�	#+
p��*����
�������k�i�gLx�G<������U�I���@����6����%���j�,�(,pD@o@���p[����Y@*��c�F(>�D
��A�q�h�@1I���
������I�0A ������a�w
�jgq�� ��,���	����F�%cE%�-�&0��,�����$���
�M,q\1�r(������$����@���e��h��y@Ep,gb��E������J��B���eI03�2���(\���R5[��#�����
�@��A�LDL
�A��$�����p|D%���f�^	�Y����<
�>����M���
A��t�n�((B$K� �Q
������3qXkDtjW@jD�QKP��J���f�G;�_dR+��L+�+�A�T���e+,c���������Y�p	�TFJ�.��3��KCT��Vs�A�A���1*Y�O��x����,0���E 'f5Z�]W���4D42IP�D�t"aob��[�3�"6i�F�G?/�x��\���!��Y�3B��[�f��T#�"$�U���
�
�!U��Eg`�+Zhj�I+�����A-6q��DP����E��M�
D��"�w6i6B�]D(�M�&��h��:�1D�i�Z����6Y}7q�07����$M�t[�'�dcv��	JDR{D+�A��.�nN,�%�4�7W|vDp
��2�u�f,m���As5�I���KU��_�x��r	��GP-K�Z\�
���3~��hp^F�7�f��wj�xY�o�	���t��M�'��;E��p_<oYw����FU����D�N(�s����5{�2���q}��md!
�"A�Lw���I������r�3��jE%�'����S����V9�D�U��D��E�wL�vR������ �A�6���SG��
����+����P'p���#���R:k��r�_�rUy:o�� ����z{X���h��GG��AH8E�/%zR��5��}�����B`��*��kG�j�o��a]�)���{�f�Q;q���3�xg3��}�"������2��;���Il��������Y�.D<�Y
�))m	����zT���zF�z!|���LE;��_[���/7{��������+�Y������R�@�'M�?EcK���0#���S��ft�5;}_���@�s���<+�F�3�YX�@��*Q}$��<^����B��D�:)I���DZE$/
}M�<��K�w<G8��sG����.�����t}�.�]I�$�->�h�S|�W������;��K�'�S	�^�P+��6�8:��@�}�(��1�i(m�C�"�Pa7� i8D�J�]
�2/'��;����G|���8>��D�����}q
�-����@����/G��[	4xaB�6t�bD��T�xcF��N������##�be��Y$Y�t�fL�#)���g��� ?�p5hP�C�5z�E<6K%u��O�p�Z���S�[��L!�����l���,�]���8(m�An���%����v�f\��`�o��-*���T��)W�l����8����th�v��i:����XP�v}7/�d� ��6����3����D+��_���q%q�K�]p�����bl��-Wkp���EQv��1��)�x����?Z����p~{<�F~���T_4�����k2�xs�87d/��Je��V)�0��*��B��$�����MT(3�qp�]|�8�Jte�5��Eu|���x:���h{��E�oG��8��$_�/.��t��+�2��������R�%��4��1�D��b8rT�-W>��2W��,�2���L�I:�2.7C+�)/<}h���xX�Q���4�Yq$95�19�RU]u!6
�3�P�������R�bN�\9s�!1rQ�n��V��8��d�b4�O�M��!,��
��� 0�V�MYu#�����1m�R�f�}�NW��G��W&6�ZA�4�����RX#9|���8�F��F~X(qU�"-�!6T]}�2���=�n�TPa���W�b���������@5�$��[v���R@����M����H�Q�����T��A�2Y��eE�d��(������^�^���j;NH7
�����J$W�q����J�nN) ��p+������!o���
hc8���/je��`��)�K=�:5/�ZD��V���a�
Bx�\�DW!�3r���!��9���[V@�S���NX[v���6�5���"/.N�\�1P����LxJv{Z���
�J����Y��B,� �}�h���!$����@X��p-�
���Vt���=���2����3��4�p�AOqX���d
i�Y��
]�8��g���B���STT��fG�G�_��b��h�=�8k�GD:�vK
L(����VJ#I�/<L����zg�\)x/I!X�hEl�7�B�4���`��Ib e���q&Pd�
�Y4,�K�!��6/$Vi� !Q�
7`%s3�����da���@z�������pC8���+[�,��8�I*��8I�A�q\;����=��$-�aZR`���2-�l�	]� m"S1n��
L��J���@`��9�� ��et��U0�0��*_R��� r[df�>q���~�i��wLp�-���`|����~���	I&�%���A���q���x��A��`�\2(H���� 8�;P���������P��b�l	zh�w})�AM_�+�Dl`�P�`����{�C������BQd{\���\����6�U(�SKGj��E��$$=���n���S)�R�U	!0}#O}:��D
���?~��MILw4�-�(k�h\��Xs&�hO��Z���_��T�H�
��=�"`����%�u9���b�F�#���n���2���BA(kN���BA�G�PR� �K�K��E6C��<����;7�^c�&o
�:�g������h�|����H�S�9��$�+�M���D��N��yNw
��Q��%~L����Y��#i��YlB\��]�o�5�
8p�p(n3������%��I`��jvY.���D\��
d�_���]t��Gr�a	]��v'��|8�gY�
a�x������g?�����	]hC��V�����0��*�� ]iA".d������p����I]jS���V5�K��U����h@c��B���V��K��\�������0U��(��a]%��o`�l�]mk+:
�6��
�6l���0���������d3�����	�m�	�`{��������o����	^p����A�������wD�=0H����9�q��!��I^r��<�M�8O^q��|�/�`^s���D�	L0�'���G�Kn^t��	�1���-��4��Q�7�
�.8��� �@&� �Y�,Y��� H�iGy��+r��RW�q��v���y�����w~���.��������}�C��K`�0$�7�qF �&����9���.[��[/�> �*��b�I^�8=/#��i������!G��g_t'P������_|��y9��MN�����y�Cs��G0z_�0����	�����**;{���b���R��( �B^t0d#�������������b�������:�*-��P�L�dg�H"y�pFe�#N��2b�����B5xb����c��w�d��B�������-$L ��� &L*vf(�P*D�-R0.(�+y��L��Q,,zKs2p$pc	G����1r��/>�v@��`���v~0Y���(�9N�Q��%6�LX�	��8�(\�B��B�:C�4�� ��8�+-��EJ0.0/�%4*.�JQFa�0J#Z�=B�,�h��%:���V�[BbD��
:���>b��%\����/2i��OSv,������"Ue
�%y��e��
���b�`�`$Qzh����6��,�"��+�'/b����N����"|�����
�<Z��J�P�
A/��b�yB��K*R*�� �v��#_#Y�hBRU`�"M�+��$	i�C���G��X��g���S��,�%	�v!(Gc$I�$��SA
��#�K��^�7\�!=(
e��� *�&'gf=bI�'��*m�v8l-)�(iB���X�.'�*���N���!A�f,G"�Q
0�Q����v��`Ja��:�:�%.5")�R���3�'/�c�`��3��0�����.��`V��j�l��R`�p.O��<�7e'4S"�Qs��'T2(ig+��h�Bj��
@��QxG8;� ����S;�&5�������$iG1ke��9��C���rSw�;Y�7��n��>���
�)��
����
H, ~����J2kTV�8�IR>�����B�?3�0�Sz3��]& 0����
B:�Dg39�q�F�fCg�c��F[�����CZT��m   0|�q^4G�FF��cj�IG��a|�A����KHmac��B��n�L
�I��G=�L?�J�P���P��l�G�T�Tj�O��L��V��O����b3%�KS�����:uh�RY�O'uT�RU�J��S�k�5N)>��K3�b$�T9�RS�P0�U�J���c��LQ	��d�Q_�cP�W�!^\UX�dMy40`w��Vm�Ri�\�T�\|�Z�dU�uJ�U[�D#�",�D@:�� (�a�`^aW��]��]�$[�F��^u��xD��@�����z���bI���^�^�E��aK�^�D�S*�qU��R�30a%�T�cC�aC�@"�d���b_�dR�e�]ctZO6M@�f�cdo�=LVg�cP��M3�x�e�hdI{�IlVi�#g�V<xj��X��.�|`&TZ�tj�C��I��l�Cj�V9�Vm5sf��D�ny�l��5��n]�m��h��oDn�v4�Vp?o4�
$8q��o�=�r-�p'�t��r�#_A�b�k/��`bi7silO�=*Wu�s[w86P_����R(`:ela�<$�w��u�+Wx�bs?"\3�^V0X'i�w7|z�"x�W���zQ�[�^��R��4�Ku�Q�W9��|��z��(��}�"+y�bI%W���bz�e��c�7zS�7Xx4�������"y
%�"�=
b�|�k��y�2����l�3�2��gb���]x�H�Zds���|?�30��ab}_�%<X�]B	A*���!0�v|���k8s�)#��#h��5"�]uY�LL���2\��1����!�X�#U6���"0l��M�������!�������!������!����v�������(�X���� �����]^aR��bE��<�C=�/� ��?�=�c��9b��'*���uWY&nX*�Sr5��S�1;�@c�#��e@7w����Y��y,Z�[�x�SUu|Dbl��X90�$�*�e�8"vK�9>��y�������MY�]Ym�]FOb�B @ ��y�ycR�\	�P�!b���:��������CEa
��#b���W9��]0�y?�0��5x Y�����"�w`�K��i]A
����5��9�Sy5#]<�?Z�]�k��
(��p�]��1B]F�[W��%"�y�M9�~�]VA����Z�5��0,� V��H�Z���)w�b=2����I"������T�@!b0"�(Z�"���!�d�7�je+����Y���s�tE�4!�:0�Ec�_�&9W���Z3��d,�b������6�I�z���0�!.kV�d��&�bb�_{!�Dq=������#b����"��R6�
!��!H�Bl�0(����{"��>��� b���[��8���� �h�����9qh��6���!�����������[�8�F�{!R���9 b�| ��#�>��@�?��K��O�S\�W��[��_�c\�g��k��o�s�\�w��W����
��z��������\�����������\����������
��������������	L�j��,y<
(��'�!�4�L{\�����\��|��`��	�\������]���_����	�����.��	by7H��"��;!0���k������S]�S=V���o�c]�g��k��o�s=�� �z��&�w
h�m�S��C������%ds�%bJA��`�]����]����������]�����������]������������<���QR�
�]���������^�>�K�^��������'��+��/����^`
@�������m�=�e{�::D���^�%�)����!���������B�Y��`�����v��U�?;;p[!��S]B���6�!���<�����������6��	���|8���b��<#�����"�~�����^�?A
h����G��Z�8|�4;r���"Wg�!���v!�*�����{*3�!�^��>�]a/�rZ<�������N�#�E,!�u�"��;D���9� �>�=��A_��2-R�V�9d~���:ds�	
h���#�	B�y�u���Y������8&�b&I����C���X<���b\��_!B9�wV���)���9�� l��d���*\�����#J�H����3j������ �)X*���(S�\�����K�����8s��y���@�
���(���@�')
�$�`��IA���S��8����S4�!�t�GJ�V�K����x+4�������i�����+�+�N���#K�|r���@OP��'���:J��@����	��@�	�Fh(6)����P��i��=��������<�����	�N�����_����F7�1���+�Bz�N��SN����[�*�c�
M��� !
gyh�C{1w��v']�F�\vVhad��w���_R[	��m����y���H�\B{T��Y=�`\r����%���@��`�D��F&��I�hKJ��n�	t�S�b�+�P��G�8��JFi������lN4d�p���r�y�N���
!��������K����D�	��:�I������Da��_z*���������J����j������'+`v�x��:������;�����&�,n�0�Q$�l��Nk�_�^�m]�n�-O�~+nN{D���V���5e���+u���!�K���k�����n�k�n�S$o�;do�/�/�
+�o�4p�
�1�o���9q�_Lr���o�*�r�:�3�&�<o�6��r����s�2��n�B��s������>/m`�N{Kt��M��J_�l�Zsu��N
v�V��k�f��u��}���b��k�r��v�����qn��j�~�Jw���M����$�#�w�����O���3��&d��(�O�x��JN����n'����Fx),�zb��n�����z�l������i����U{�l�����/����Y��cD���7�\��6�}����"y����Ua�~�������s~��Q~	��o���s�~��!4��~���Y��B�d�#X�R�;���
.�@^q�H���H"B"Y���QA�����4��^�y4��h&6�P����0M���B(D�����9!S#
�a�"D�)��V�����h���"E^(��T����"���5ngSN��B�9.��v<����6���9�(_
���:�G2<$t��H�l��sA`��GCB2/x���H�I�<��;�a�f�HK��.�<�Q�J�|��6YD�bIS��'��eNX������.�A��PK)S1�<&Lx�L����)	��b@�J���6��m����I:w�|����J����p��}'<;��+�3j���9O�����&@5"��jda���yM�����@#Z��R�"z;$?/��r�`���+�(R�d��-��R�L���(Lr�}6t�-��(N���
D�>HM�I���D�F�EO}
��5�)MjB�j�����>}��*U����(�*N��S����]�jP�:S����nD�T��S����3����T����-�+L��F���8�J�R����AE�L[R����b��O%S����%�,5�S�����HE�E����*E-GU�Q�Z��3�mIe{Q�r��S�-Lu+R�R��.��R�~��E.E��D������C�Q�"��%��E�kP�:��E�H�KQ���E����Q�F����@��C�r���o?�P�����A<O���6D0E<P������%Q
����0<5C;����:E�N���E�<UN��J@ @	����4����e�chQ����3��24��B�9br���e�@�RV�$1s*��u�:��0��c)o�e��9�ln��{)Ms�u~���<+s����	��F/���,��	M�J[������7��N{�����GM�R����N��W���V�������gM�Z�������w��^�������M�b���N�����f;�������M�j[���������n{������M�r����N�����v��������M�z����������~�����N������0L Dy$��D��H)|��K �j�*`��<�F�P�@��B2����8���������E~������8�~���%�,���8��X���b������
G�@���<��C�0�>�������@�����3�<d��O�f	|=����S%�D���!@8���c����b�fH��+��!8]"��<}��>)u��N<�?8TCq���C�S`^�B��"v(~
��>!B�
���+s�g�n�b�9`~}��K~��~�o����_H��/~��>��w
�~������z!{��O���w��}�R~��|I�{��Iau ���u�e�
3" 'Bxp|A	�G!"q�7�M�`#��	�[7#�x�	�!DX�E�}���I����I�C�F8�|
�Y_R�I�L���/��$�&TH�h�'�iX��t��tX�q�Ia�!{��qs�"~�;�W1h�|(<��U��Gq����<�M�xq{���z ���{�vC��Ez'�xPw@�7��~	�b�M�Pw���(�����
�N�vi�{��u��!{ t� u��`8;�������
�
�!I��B8���)��t�����xH�MA�	��1P�Pw(����{�W������ wM8#��a���OxQ@ ��xf�A	u�s��{�!��;�q~���x�
x(|@��P
u����|)��������������O���
u �|�1��1#�L�1�a�������'iPv�x���
#�����!{H9;]�3B�!� 9�I!�	�g��Iq���z���I�
1�w��������d��Y��8���+�(���	�SN���
3R8i���4���IW����1h���;���a��)v��'
�����K`��
��IQ��
��������N�A�!�a�,9�G�
���t�c��G�����	1#RiP^�q�9�����	����7�����	I��x|���u����� ��o�/����"�p@#(z8���8�Q�sp����1#�iW�����aKy{�������p@@ ����P*��N�����q��i;'��:�Qw��.��Y�
�������a�0s�W3yl	LLZ�@�
�����9�4�������������<�}Ya�����b��������s�V*�m)�A������z�xK���q����i�
y[�Ld������
xx�����tyL�h�a��iv�i�z^��X�7�Q�������:��������������XyL��G�����Z��#�
L�8�+L
`C�s�{g�?E�9	���A���Z��qq����N"����S�>�����#|��v�����y����o�L5�aZh�u�w������z������N��
z��s��A��D�Az����N�
���@Z|�
M3��Z������0k;���H���
������7��}�s ���
>`����<{Kg�����
�=�I��A	f�E��uj�k��Az�@p�������y�
�{��x��9��I����+���g�����|���Z|�:�(�����q�[Y
 ���d�p	������7����
�{p���k����a;W����j���L���zX�J����`�"�K�������x���������A���t�)�x[�a�
a
����#�
��)����k�����L�k����h�Z�Na��d�,�
����������
���
f(�J�8��D�I	d�Vl%J��OL:�K�A9�K��;��La8�����
�~wK���D���!q{Tl��{���
!���0�q�d�Lf���������(��3#?9�������w��p���K�w�
���gq�IG	�x��	���L��G�&up��Q����Z�L����;�V���	��
�������(���~��k;�g���������)��l�|L���
Q�Q
��,h�
�����,{�W�W��	�	����
���I!���]�Mz���o,���������;;�V��=�?\HQ��� |L��OK�La����dz��d�����������9��M����O��q���
����D�"��1�
����q��������!{\M�P���������_x��J�N!�����
��4#t�
�>���x�m����E}J����	��|�L+��G���|�a����]K����:���b���-r�zD]%Z�N�G�������Px�|w�	������x��]���37�G�
t�����@8%'����<�(Bxg�����}K�'���K���"�����i
��d]���'����xp�v�|�<u1x�N:�W�%A0>�"���3G��s�|�I�#����L�(<0B`u����h�����	(R>0C�{������@��tmSR�K�*.P|�<�!>AP�G~J��� C.�U��	|��
HG7>0"��+<{���<|1N�u'��l>A@����C	�G-.�/+x-��}K>�{U��d{������x��%�T��N�O�B������p��W5�KC���1�f�9�
Q
��!@��\�����$�����	y��pL�|�������`zi��.=��K��f������yw�
=��`�
L�~�< ��^HU���l@ ���b���n����������$���u��h<�B������
C@t?@��=�C0�jA�s���A�>*�x��%���4�= � O�s@�B �:��?P�0�=��{$���&�C�<��B����N��P�R?�T_�V�X��Z��\��^��`�b?�d_�f�h���j��l��n��p�r?�t_�v�x��z��|��~�����?��_������������������?��_������������������?��_������������������?��_������������������?��_������������������?��_������������������?��_������������������?��_���������������aK�@�
D�P�B�
>�Q�D�-^��Q��F�=~R�H�%M�D�R�J�-]��S�L�5m���S�N�=}�T�P�E�E�T�R�M�>�U�T�U�^��U�V�]�~V�X�e��E�V�Z�m���W�\�u����W�^�}��X�`��
F�X�b��?�Y�d��-_��Y�f��=Z�h��M�F�Z�j��]��[�l��m���[�n��}�\�p���G�\�r���?�]�t���_��]�v���^�x����G�^�z�����_�|������_�~���o��c;8���=�����B���c�=^!���(��=���??1���M�'
	�����8�	zeE�*p��~0q��x4�G��p��M����q��q����F�p)G��0Q�$�L3u�d-���U,�� 5�A��42�9
��,���=���t�!N$��L4��Wz��(���<h��O���t�	�3OE5UUA��� `���P�9M�*!j$V.*eTBm	�����Q��%hQ�%�Q"-hO2
BM��|,��^-��`M$��=L|rUw��7��
0�������9s}���hLt�41���01�e.�Y�IZj	j�Z��8�����b�� ���d�O�S~o�W�p#p�tWbv�U�`��j��bl'Z�NtV�(����If���e���z�m����|��Y!�_��p5(��|�H�vQ��r	��D;�>qf[*��j��������������&P�V�����	m��^������D<���u��;t�G��,��V�����h!,�������^���my�tL<�tm1�����	t��� �l�t����+��"N~�@�&B�fO��Z��<h�������8=o�T?�-�p����3x@���M �B�w	��q
���g�T@���P��An �P��G�U�@������on	��w�? ����(�� H�>��N.��tc3�T��r��c�&\�="�~����
r��tgDcy3�5
�h��I��p����@��F��m$�@�u�U�lMx�A�%�=�qM`�C^��miI+������RKk�����Hk���Rh����FPXA	Hp"�����,F�
��MD!���	�A*�%�=$�	�0OT�������@��JWaQRX��
`�!������Nv�S5���-��=��<�!B�ii�����F��b��7�Y�����V���?�h��>��K�x�F��WBHIiU`N8�+Z�]���)RB��9��]�P���� 6'���w2'B�9�p���yf�,�L���\���9a� ��(��L[z4cX��N�����q�>��UHo��* ������8*�(�s�������+R' Wt �p$"��U�D+�������l!.`� *
�	�:�v��@8��]�Q�
�H�T�W��h�h��~������m�,����m��@8�O�FQ>��A������ `�v���`)$J@��R!�������D�9�,��5@3K!����rE���p�?�|�������R�w��W�x{"	�!�s��[E<b7f�yi"�
N�� �h�A�)���J\l�x���=��@�m�ID�i��C���������E�b���!'�b��6�]�D^s{Mr�b[\��N����!]�/���R������A�;�M$����
�][�����\A2���Tsq.��i"�
���4���T�����5_�|Mu
�aBH�~&��'\s���K��bN�&���LM�Z ���A�Y~rbN`�r��B����%����[N��km��l��\�(��� �����sx�F#������@��=��� ����`���C*�v��*H��v��9��)���l���X�WC+�R1d����[F�e����3�
R1+!����<k�������2����[�!�&�A��e�!���(�=�i�b��2P�'D�8WV�F9EVArt5�:�xA�+sZ�Y�+7��Y��(
)l�R��R��~�@�l"_/��eLH��������!G|����?k"��owma�6��+��BFNa�=�&J�AD������R>\��t����U�9�0Q��,�@���
���������.z�������)�v����0e���.|����c�e�(� HT�'RH�Z>���D�^|�����4��y��
t�1w��hY[����7N��� i�[�j�.�;�s6��2��������V���
���j�5)T{�)�������!>A��A>[����k�!��AU��d�V+A���*S>��;{��A�����&���#?�3��c4���+�B�h<�;�*����c����	l=��1A��J����(�NC=�p���@.�;y�"�)��;�X����\�B?>�3(J[���8I4���7 ���p.���\�!���E2C��o)��D8���B���X��Yt�����(���+P|�{2L��Z�mYa�;u9�7���Z�����K��B<L��6�%�P��sD��U��=���Z
6��oG�#=C�!J�K�@��A���M��@\�N�f�C���/1)@���Z���),G#Z����H��E(�����a=[C��<^3����]��fl�\�9��:\6)l.��S-��H�{G��:8��<0�����p���p��FL7���IlA|�F}�D����)����Q�>��p>b$U#����/�H�4�������d��<��*����,�H��[���elC�d(�G��:5�:��FbA���=p�A�4�V;�����]���T��kH��y���4���C�s&|���4GB�P�M�,��(� h,X���W������d���K��`c��@*4D�C�����B�_J���1��@�L�S���I���7bMV�Il�$[.��F�l��o�+�����I{3�CM���p�K�}�M��M�x��L3��k�:R�%m�!��F<2G!���l�Q�|��������N�r�?i��Lf�J���d��4@�X�ma�O4�jJ��`���'C�C�z�����(*(p�G�F����ISP��JtP�M��	�G���=.�J
UB�,��|E������7R���<U�U&7e��Q�����B���\�R9��6�l�e�%,8�D� M��t.4���������|�U����=�����'���G�J�S�z��/�G0=S�	��l����R�7�Y���LR�6��G��k�N��������j��P���p�@5Sd���U�*������z12!�E5�}Tp�+�0!�vK6��L��7�)G��#���`W�1M�xTC'�,o����Up$X��;��<�-�Uq��|=����xt�%L��y,�"�+	X+N(���Ru%TN	�����z��k���e�l�G@�	���6Z+a=���!(�m	����
A�c�)
!�U�2G�%$M��� 09]TMKE;"��$M
�
�J9G�1��= �j����JbZ!������
��=�*u��
~z�x��.��p��0u�q!����]��}��:��Pcu7@����`�UM>��Z���d[�M�����]4Z$
�W(sn�(��p��%H@B^������� �5	,U����'��=�9X�|�*��W=�{�*�6p���N���*5�Z%��]},a�����:Q�E6} U���k;_�^���u��������,���\���	pVtt5���:0`R��F�7
Y��Eq�=H��uL�%N��p�	�m4\)u�
�(��]R ����|��r�,MW���������m@�s����Y��J��uH�`q�g���,E��U�����n��8�cvz�	#!pG�� ����c�B�E(�
�9�X�X��!H�(;X����� di��RJ�F���=�c P����]\Od�������9d��>nZ
��=(�E�d`!��Ee�x�HFX	Z��b6�c&L��?�]�(��D�h��i�
e�
g�N���W��n��o
k��?	=�h�tV�u�	q���+]�(�lc�z��{�	w���Ia���,�����S�}�U��l^�,1\�v������U��%^!��~�������&��6��F��V��f��v������������������������������&��6��F���V��f��v������������������������������&��6��F��V��f��v������������������������������&��6��F��V��f��v������������������������������&��6��F��V��f��v������������������������������&��6��F��V��f��v������������������������������&��6��F��V��f��v�����������������������;

brinsort-all-data.gifimage/gif; name=brinsort-all-data.gifDownload

GIF89aS���###,,,333<<<6F_8OuDDDLLLSSS\\\dddllluuu|||;_�=h�Ex�T��V��[��B��F��K��Q��U��[��_��B��n��b��g��k��s��r��c��i��a��w��}��u��{��}��m��s��x��w��{��u��~��w����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������!�,S���	H����*\�����#J�H����3j������ C�I����(S�\�����0c��I����8s�������@�
J����H�*]�����P�J�J����X�j������`��K����h��]�����p���K����x���������L�����+^������#K�L�����3k�������C�M�����S�^������c��M�����s���������N������+_�������K�N������k����������O�������_���������O�������������(��h��&���6���F(��Vh��f���v��� �(��$�h��(����,����0�(��4�h��8����<����@)��Di��H&���L6���PF)��TVi��Xf���\v���`�)��d�i��h����l����p�)��t�i��x����|�����*���j���&����6����F*���Vj���f����v�����*����j���������������*����j�������������+���k���&����6����F+���Vk���f����v�����+����k��������������+����k������������,���������B��;F
G,��2�'�L�0�Sl�!��2� +��/�L"%�$H9D��4����(81)
���4-��5�����@��LC`��<���'���=X������M��C�JE��3����v�����BJ�m7�;����A��m��C{R��}o���w�I~�nhC�6����v�L���Bucw��M��.4�PD���"��3�@:P��A�=y���y9�CG�yB��t�'�:�d��YO����G�����lvP��@3��	�{������7����DN�
H#T<:�`������1/}C��X�'4�Idz���/r@4���H�.h�
O�����Pz����l|���������|�Ay@7�ID���5�ennA r>�����2XB�p�{L��8B��!�Z�g����!}��)hX82-z��E~��<$���bO�83��&rWd��D���{4{bA�X�A`����"��1Z�B����F���vI��'��:F�|8 �+����z
!��p�����B�C� ����g�%gi%KB��B+�@�X�C-~`(��B��"���D�8�-���n�
iIM)�"�</+�>U$�!`B2����(�H���H���IO']�!1����M�0@<8"�_���9<f:����m�!�Dv����=0�����BeB�_�1�_x���.�hP��;�����n4z�_r��a�>�AEy�=8�"���D+zQ�Z�j��'�i��)}V=Z��.��v�o�����C�9���s�D�@��b���
A'\�)�T`T}�>"o���<�!�:Mm�dZ+�	6-<0�A;3������JB�86h9x�D��Y��m�����0���|M��\)]���Xj���*H�
a�2�����Vh�
�B������
��D@;���B�L�An�(KA0�q�]Zw#�r6��-Fk�������r���x��]o384d���mBkF���Q �-0l�Ka� w!�p�bKV�� s-^
z��
D��_���b�����hA��e-�C�C~Z��`��/"������q����!H�h,K�(8�A�m1I��/��}.��Nq�JD���(�������+�M���Pz��H�#����H,A����	��Ee�,��?��i�"��M��K>Yk����������N]`�DYq�&5���+�3�
V��n��� �78��w}2��x�_��8�L����v1"����8"�A�L�B����v��o����IO�{z�6��'���Z�P.I#�4l��<sBN=3� ��C��=?J���v�g���������|[�Q�����h��������5\S>,/��}�CtA��
�'A~�<���Zp�z 3���\B��Zam�3��M��5�
�7�2��	Z �X��+����~�3�r�g��l�+�[�{W���_��r��Q�xg��r������;��i��a/��%�@BL�D�4k�A�m�O'��Oh	�C���U$���t�l��m��.'h����w��(�����|�u�@6���	���g������uW\�����Xx�e��p��[��Uu�C�s���7�"�Fh�����WG�'��g�x��]�CJ�����������@�~�N����u
��|�p>W$vN�|=G�73
X�kRf24��|�|�Cq��l�}�GP]W4WbAy��'w>�~'&4���el4�����H
AF��PCoE���O�t>'��tB��
�L��u4�z�${m'|X[VL�\X���|��p�G���x��})hWb�q'�B�sK�YDiO(� �7v���(�����"�X$���s�A�1?jT�}�o�������n��~f8�����Cv-������'�(8g~��sA
>pk�sh�Uq�7��3�p
A�a|:�����u���oG3s�a���|��7�i��8|N��4�u
��H����t}�7T�'<�;8PT4xlu�f��+�{�X��m=���(�C��83j4?�Hz���> ���PafT��1d4�]���
9�X�����AS�w��t�,Y��hB@�X�ft��6y�>�0N��8=P����X�wm���#������4�1�{�8�yA<;����y��7�#��7h�B��&�A��3���L$I3&�q�%4�Wf���z9��k�%��(I'�b�C�i	J�dc
!���!�S�w;=xj�4�B�z���	A��oD������4��X�����6h�5�)�v�E�a����`{�����4�����#\����N�q�����#q
�	��t!��W_I�|�<v77�~�Zw���hG������e����pjq��_H|��V�)�������(4��r��u��x��1����$�	�EE��rA@z4?}aU���6������A��)�U�=��@1=�	s����	��4Z�y�Y��I��34ey�3j��C�<J@���W/i��t�����Y`@�ZX!:3����{�u��)4���X���H�5���q�W54f��H4��Fi��9��i2G�k$|@�	q>`*�A���OO�:j�:9�d��zA��1�hG���?����cl����:��:�%Il�)�g:�04+��n��������Y`�s��x2T������\����i�A��
�::������3����d�CZ��M	����J��
��������A�?4TH��(\`�IA���8�Ej����Bc������������'A)D`��Z?��g
��o��	����l*�Z��K��"k�J����~D������X3���Q�����q>�y��t�P����@NW�Z�y�H��W�����Y�\
�3�#���a����<z���Y�y�Z��^zmj^�e����0����y�	��$��'��4�
ZN�l9��u2W�^��q��i������J�Z��������)���7[�����z:����Z�l������h�|����?,;�X�i�&� 4?tK^7�w>�k�+�q5�SU���C�7�;K�J�(�jK3Y�����6{G{��st�e{�0�[�����i�Z�*�]��[<�Z��
Q����6]���
Ab_K�f���u������r+�`K���J�\6�������/�����a��i�J����;�6wxCC:�������YR[r��v�����i:�M]5��7���x��D�I8����V��J~C#����;&\B�����L��D*E�z8C����!�O��x��c�
q>n|������z���������
��A�i��{���X���xy�u94t�|���W|�����p
K4�%�	1��D��J]/�se<F��o�2����V�<���kx���
1�n;���Y����#zP��Cm�������R=��oswZ�m���\�����������o#�����;���� �
��y*I�Z����	� �>�?��g�������2Z�r{m��bvu���u�i�%D��?	1�%h�>��=�p�$`<t�{]�q���sC�F�i|��F��Y���8�����bK�+�X�'����Dc�A
1��`�@�Uk9��sh��;���NTU'���47 S�A��nFk=��f��^�V���h�m^��m�:�Uu�KQ�|�R�4��P�eF���-��9�C�>�*-��J��l��^�]�p�����[��4�Y�8�47-�;����X�R��!�*Y"��L�>�e���	A
�=�\�����YV����Q�D��x�Q��x�����2<����-7�k77p�rl��
5������24Z�%���C-\��b�]j�9��`�-I��_�};$���<�93:`����A��C}}���
���B}�Y��>��bc������>��
q
����w:��d���ar�E�Q�v�t�c��:�}2�^U���;3���r����q
S�F~�E.���A�F������gG_Pe���kH^���<�2Y?��9�x��=0�N^�aKETz����R��=�w��)���P�g<`�V~�6h���;�`�`�gd��p�hCk��	O^S}����=h���qE�����w*��>#$A"!�H�Q��������5�D���������B������������~���:�������,�������,���^��N,�������+������+����|��^+��#����+���>���*B������c������_����
�������?�_���������� �"?�$_�&�(��*��,��.��0�2�?�4_�6�8��:��<��>��@�B?�D_�F�H��J��L��N��P�R?�T_�V�X��Z��\��^��`�b?�d�@
h�Fh��iAk_�e�
B
B��i���,�~/!t��v�y��a;�����KWw��a���{5��������?�x�����	��������O��o�A��n��?������_�_U�����_����g�������1A��a���?~����� �#A���	\���������R���=%��o{�_�6	D�<���S?VFE4��3mtw�`v�W��QL�=t�A��b
z����D�^��Q�F�=~R�H�%M�D�R�J�-]��S�L�5m���S�N�=}�T�P�E�E�T�R�DI�u"ZIM��������!U���F������Q�h�3�P��;2�=8���y���CoS��
F�X�b��?�Y�d��-_��YsK}��P������v=���g������q��g>���{�.F�:���;X���7G�\�r���?�]�t���_O����T�~�2k��:J;<Mrj]�}���s�{�>[y��h���S��{� Ox����n���_9��<���f���T�K%��t��C?1DG$�DOD1E��%��0bK7��3m"���!C$l(��0�����!�b�����%}���2j!���H�V��K/�3L1�$�L3��R�8�%�+�)����I*#7*o5n"\s������������(�\� �2:e"%���RK/�4SM7��SO�C2�����8��Q�@e��Q��L�"�j�h=�!E
��IC��#��H>9�Se�e�Yg��6Zi����8��NS��3�:lr�71j!��:��]��a�D�mX��.m��7_}���_��TL��Sk<�F��(Rs="����a���Q�0.F^z/��5)5`�OF9e�Wf�e����CT#��mIC5aUu�hXx�#h��D��l��,��v#����]&��e�A����_��k��;l���=z N��F�[����j�������!]��=4����x#$���i�G<q�g�q�y:$��v8�^��jgw5rp����(�����U]���]��(��!R����v�o�=w���\���.f���-��������b�y�fwHA&�J�b��H���������]*�w'�|��G?}����6d��r�r��[�k��%���!W�����?6Bi+���@6����I���^�`����B����
&��]�����%�s�HN1�-E��/�ae8C��K������-��~���6r
�5�Z���z3�8���^
1B�����4�b��E.v�L��GRO� � ����1��TH�W�z(��9������Iy��d 9����6�X/��np�Q^G(4��e�\?��!������{;��D��:����De*U������
	���3~$��CK��%��Tq{ulH��F�N����A+�j"�de4�9MjV�S��LRE'���m�|�F��$��"����lC�2lb���X��)����S'2;Z����h@:��i�6z��b��J&m�<���y����l��`����P��7�
b����ii��
�eQ�z�iLe:S��ktJ��0��o6���mx��b�fn��8�B����P���C4I����W�jV��U0	Br<]������s�@��TJ�N��&�
�^��X����.�����=���l`;X��b|�Cv�R�����
�X\��J�x� ^��Tx����/H1���Y�EmjU�Z������mle;[������mnu�[�����np�;\����Enr��\�6����nt�;]�V����nv��]�v���ox�;^�����Eoz��^������o|�;_������o~��_�����p�<`��Fp��`7���p�%<a
W���p�5�aw��q�E<b���'Fq�U�b���/�q�e<c���7�q�u�c���?r��<d"��GF�r���d&7��O�r��<e*W��W�r���e.w��_s��<f2���gFs���f6���o�s��<g:���w�s���g>���t�=hB���Ft��hF7����t�%=iJW����t�5�iNw���u�E=jR����Fu�U�jV�����u�e=kZ�����u�u�k^����v��=lb���Fv���lf7����v��=mjW���q/b�Mh�^�B*��mc�����0�nv�{��x�)!��&Xz�x*��-�.���5o��=p@�A���i��BXh�V�!�Vo�������������~8�%��wwo7�|�>$<����C>�����X@9�)�`���0���%\��� ����7a�GX�&�yipt�A�-�8��4�'8�����7qu������'~���\�Xg����v����r����n�����C�{y7qt�=��z������G7������+^�$?���q�>�
~:�?�*P�������^��>��W���]��z��������;�-_w���T���l{��^��o���7���w~��/������;�)/�����5������/��>?x�t8���W��������W�������+�0=|9�=�=$���9���+0�K0�c0��.
��XP��c��s��������K�
���+?���J��;�4�����/�?�;t�����/�;�I���#��3B�����%$0D0
\��.M�:�c��s����������JC��S�!��"������
�B����,0.���C���C1�C2<����C�[�&��'��(��9DBtD���,��-�D���Kl��{��@>$0?�.��C
�DR�DSl@T0U�.���
xE��M<0-��O�.@�C���Ld?_40`�:a�.D�C+D6C�+C��A�S9jD��S�cF�s�������Q����_��`�C���@t�]�C�CD�"F��Au��S�������{4D|����G�cD<H�LH��F����5,�6t�7��x���$�n�FXLFY$AZ0[�.	<�.0Hk<��,/ElE�D�j��k,�l�./���<F�G��A�0����D%��u�v\�wlF��.�D�I�0�G�$G���D
�����YH�bE�;�L0��E��D��D���3D*��s���KN�KO���:�h\� ��4K�$��#�<����t���G��.$�h�o@�tC�0��.�$��0��L�|J��J��.�$Li�I�\@��H��/����,�������tM����|�����T��d�����MPC��J�JG�+G�j�@dI�������N�����h�N�,��<��Z�hD�l��|O���@���tI��������N�DL�T���?�TJ�O��O���ht���M����M��M�b�@�<�,P�<P�J�h�FCM�N|N�JP��m��|��R�@�M0MKQ]N��J�L��O���.������P����������4^ �%
�&��'��(����N��+�P&�Q�R����z�0]7��/,eB-]M.�//��"
��09�E:eN;�/<U%R`�*��h�(���h���("��7m70?e�p������LCP�Se�S���h�S
�`	=�N*��2��3�������`�����V5�K
�HVV]����������9���T�i"�b��TZ�d��`Uxg��C XUWU�i%L
��k��l�m��ne%]p�Y�K�U��V�)�`�S��ypE��s��t������AR=�A(�d��T`W����C`���Yu�XM]�~��������4
�5]%R�t�����WV5�8UVe���X#M�����U4U5��A2Y8�������W���p��]`������j�U�5S����$���=U�,[��t}��XY�8z=���	�W��U��H�
$P���[T������HVE����5	0�N�������O����mC�=�� VM'V}+��\�M\��e7>�S}�T@
UA�/BE%��>`�D-`U���S����@	{�M0^��V��K�=%��`U_���]��������u�������Ym�Yn�Yj"���^ ��i^��^��=���7�Q��Z��Z��/�_�V
`=��(���6	X�uSR0�F�V�f�v������������������,���!&�"6�#F�$V�%f�9�k�(��)��*��+�b��,��.��/�0fb�R,�`�/�_�0�S���=U���5�S5�a����=��>��?�@�A&�B6�CF�DV�Ef�Fv�G��H��I��J��K��L��M��N��O�P�Q&�R6�SF�TV�U����4�
>�z�_�a�p_�]L�c$`�_�`�a&�b6�cF�dV�ef�fv�g��h��i�fW�&XvA����cbZ��\�=����sF�tV�uf�vv�w��x��y��z��{��|��}��~�g2��&��6��F��V���� n9������������h������������fh�2�s>�.��`RXpN��0Z�\@�;~�8����������Y�5��n8�p�S�Y�H��8\����cv���/��N��_��_�&$�.�=��j���mj>aW�u��f7����j���~����i��n�W�����o�\����E�6}���i�i����/
��mvZ`U������Jj�Xj�5�Uk�����}_���A��b�X�m(���2�v]�o�!	�U���������k���������
`CXm�i�8v���A\h����^7��������/=I�����d}��E�b�������n���F��V��fl�r���la��Ye���%������ ��m����������ki��h�D��X����I]��%	���1�/��_��/�.J��R��
W(P���R��
XO�b��v���������0p��e���Q�����%������������m(V�
��r+�j�*���m	n
�������p�p�0vmU@������������r8�r�s� ���Z���4�h��=����!���X�js�z������@g��Ep@w��4�rD��r$?u%O��n0q�H([v*�rvs�,����u�J��\rqVq���@�ej�����Q��R��8G�9������hw�!l��|�/r�/s��^��_���rAQO�_tSGw_W�/��������eg��$�g��h��i��jG�\�0X�l�)N�_�����7��?�F��G�	���)��N����-�x��x(���y���y-k��n���uk�w�#�������8��������zk���r�����yE�����.
z)Z2�{xx�w+r������������Y����\���ds�'u��{��{���Tp��g�S=��|��� !�	E����y�~z��z��/�������l�{�'��7��G��R}�a}��u�~9hk��{��{��{��{�����_��^����*~�?~�O~�����O�l��M��<�6�����������Wu�����p�L�W���o�~�R}�*f� ��
2l��!��'R�h�"��7r���#���>�,i�$��*W�l�q`�_.g��i�&��:
Hj'��7�(��/00m��)��R�B���*��Z�6H�+��b�������j��m��L�r�����g��v��  ���7=�P-l�0�J�3n��kA��'S�y6m���7����3����=S��8L�^�1��Q���m'���w���7p���/.������P:������6a0��+X�����x������M�<�����o��{��|�0@ ���s|Y�@��y �Oi`G�
2��6h�����������r�!��C�)N��tP�a"�\A#��QX#��a�#��q���#
y_�=����I:5�T�b$�3�%���X%����%�����a*
�9&�e��C�&%����E�q>6Pdr����w�9��{�	��
��l����Y7(�S2��NyB:�M}Rz)M�b:�	����,g���w:Z*�&I�*�/
X������q��i�\Q��by���F����6d���.D��P��+�Na�l��R�,������2��!=���<��&���I�1Z�����[+����m������8 ������v������p��2�*�����D��,�bSp<l!�c����:\2��7�G���(jL�vPqB^<���!��(�?3z����)�����R�/���@�D�P,a3OwG�����A��'�c�y�{��CC�@T����lx������{�)��q���h��K�W;t
�h�&��u�	���]y��c����gixz>�@D9s��FN�;{n���C���Fv^;���g:��B��f�:u5����I��5��<���QzHD��n@|��@�[������>_����������s������������_����D "���7?������'$�B�P�s� 9Q��+�-�0���f �B��o����7�����C ��SK%�M(��JF�DO�(*Q�3����X�
������dg5��&��#�19I,�q��F�dp<j�O�4@J���&�l1E2���B�@�������6��e�{��|���`��� /yCj�6���m9��i8��� 7I����)�Gz�	M��%��-M��\�����(�3�8`
X@n�T+c�l
/�RqDU�$Np�_��+�D��9_�s3�,'Nh���hBf��eF8ABx���=��|��U���9���yn���M1��b�~�J�Z�}J�7���`�Q�`s�=�	6�	K>D��T8�Q��s�u��K�r���E�4��G�I P�!���$9���$4�-�)�(����t�k��S32<��H�`H%�J�5�Q�L�Z����8O���/��=�0Y����Z�Whu�7=jY������|�
T����!�@"��XU�J��Bn5������^/��j�*������bDDX,�
���@�'xA�����mLfo���6(��
)���������j	���!���
Jn��J7'����B�>��a�'��x����$DS�V7'�m/N���\�6���jB=���!��y���������/]��`��w�-��m�@�$�_	<H�r�Hx��({����%
.qJ \�m�}\lZ����c\'���<F��\�f(��A���{�q{*�^��X��4b!���\&I���"����;H
��� )@X��*
�K��J��sJj��-��#^�3G����UC��q*�c\�H~������A*�3!�<���y���+Bf�4���`1Q�|�&~������G���t�/��\S�����E��;�@|�c����`�����5��I3���5q�����z����l�(9D98����9�[�3C(!g��Q��~H���p�;!��/��6�>������n\ya�Qq�Bz�N�������M��q��{���7�#p������a���pCu`������+� �0r���=>������
aT� K�+W�$
$4r��T>\�J#&������s���@��*�^�^M��P�ABj���g�A���a=����������unbW�ra�WlB���2G���`�ru�:�1�w}��]�f���L@��p����T8���=^������]������I5"���s�������p�~3����p ����?����z�j ��AaF���o<�����{_�C,�� a�x ��!��� �4�qA���Z���Y�����Wb�pc����'��������C�@��YD�\��1p/�/�����_J��������d��� 
��	�_����*D�@, �\F$.��OL`1�Ap�@�H��&L\
���9�r`�)���6F�@�Y�����CD��A���eDc���,�P���A/P���,�]Sx�-��lH��P�y��Ua�]!c�4 ����aC�P��F��Jat@wL\T�w$D/�YS��Cl "F��N!I��!b�)�b�a�-0���AD��!	�F��&���F�aL[1'TB2JE+6(6# i���bJ!7v�?}#K��`������$.��Cx�Q8E��1>��A�I�P��!U��ad���lM��9�6�cI�"�����c`��9��.�'�c;*�C���a��#�x��x\A`Z�L����B2��y#MvDn�D�� �N���F�b1$Z�0��m�E�#I�W��K�FLj#=�$G8d������^�
��H�`%C4 �B���]4e�M�A����dUZ�F`e�i%�q�]4�iA	Y.D��-��@F��[���#U���b�����e���]�Z��� f��B8�@F8�b��	�ah-�&l��l�&m��m�&+��m�&o��o�&p�p�p��!'r&�r.'s6�s>'tF�tN'u��)�Duf�vn'wv�w~'xj'L�Bx��y�'z��z�gm��f���8q
`E�h&��C�gF�fj��*2	���(�&��.���X�N(�V��^(�f��n(�v��~(�����(�����(�����(�����(�����(�����(�������z��d�����'B���i6�hf����e	�@4�H��&AT����n)�v��~)�����)�����)�����)��)��h�i���{�Ai�EF�$����A�� *���*�&��"�LE
,*�F��N*�V��^*�N��f*�v��~*�����*�����*�"*L�*�����*�����j��*��*�����*��j�n*��i����+��'T���6Di*i�._BbD/aG�rQ��H��Q��Y��af�4k1\"Q8�"&F��r��^F��51�	�AAR��I�pk�x������������k��Bd1bS�+����QG�U���	"E� U�_�l�	,�,�������ZC���e��A,���1 ����APB�V�a1h�T�@�F�N�\u�+��E�l��������k1��qy�E�+��L
�����	|�&F��#	-�B�S,cF<�a������)��1��9��@mR2�QV��e����	�T'��bB��q��h���,i��|��"-E�����-��-X�|.�����.��n�r��@-�
�����J���x QY���jLC/�B/��alD��JS���	�n�r����r�)���'��Qz��)-X�F���J%T�����m�BD/��aT,�$��-��5o�=�VD�AL��M-{h�,8����	\,�~��\cr���ZY ^D����\�o�Z�Y��[����j��������.�6��.����@��}�dF���)��G��C��F�U������;�X��	�'����"FL�3��Th�DpB�6EZ�&��T��B��Z��/X��!��gZ/D�, ��i>1�j����FDV�FOD%����y�q��1��1���X�<Z��|�@��#k��1�l��9�,t�t�W���Tp���D �|)2XDQ����b2�@J������P���%7��A��M�[�*�2Cr��2|�2W�!2���1��b��F��B�|1B<,Sl��!{�3�4���4o���/6������i1O�1�&�E����w�������su��f�G%����s�����bxAF��[�D��rS��2�$3s[+��C �t)�V��0��\1D��,l�_3Td@OD/x����>7D�)�@d�w������s��V����%�B*T�U_��
L���^a�@E��+p��2����|�D�Bt���8�R�R#S��S_E�(@}���q�bRBG�����@P�%�OK�=O�x�������,�����uY�u���|��n�uR��#�o�@`��b�va��7K��D 0l���y�Sp�@�h��h�i��i#j��jo�$��[n`�Sp�,h�H��aL%E�.B�1tm
�A�lde�Ep�p�Wqw�q�/l�K�
v;�&
l�A�p�T�wI%���Sd@�%lx�m�2�4B�wu��f�w u�}g�V?�vG��E��T�_sO�c�Ezs��K�_���oW8`�([��xa��3E���lh�\/8��XI���B��_�8����#�5-D%�]�
t�4�FL�7ld�����7��?�����j&9`7yB�-?�U��T��+PG�����x�����U�#��@2�@	�
��H%����YG�nS�o���3x�����U��(q��(=:.^B�t������T�������E�Y�����U������\/�H�������\#��[��m��7E�������wV�����/�)@����.����s������+t��c@�����h>�	t� M�HW�f]{Te�M��z1�A�3�����C����J]ll�t��PA�5]����e	�S<M�O:��0�=��7�����D6bp�g?��3����/������C�l~����<6��C+����T�xe�<��<_�<R��K��rC����!�s�zTJ4H��E%xx�O�����������~�C8Y� 9���9�hyOG�w�$��	T��['��P�Q[����X�����T��� �B������=`�H��
��E�!T�@YK�OE��ET��]�Xe~Lm�Jt=�/�-���<�su���k/�t�@W��3E%Q~��2�������}@C#������x����p�ET�T��ATB�/��F5n40	(#��,�|_?@8�`A�&T��aC�!F�8�bE����cG�A�9�dI�'=�H
eK��0�OBZ���gO��`:�hQ�G�&U��iS�O�F�:��jU�W�R�e��;��s�E�3�d��A*�I�\��S���.O]9������^5A�ga��'Vl���A�!G�<�r���p���sg��-���4hR�Q���@���:�c@��qK��wo���>���|�j�#�������T��P�i@=�n�8HM5��GNq�x�tL /:]�����~}���y|���#�L3	,���D#���"`��B�vX���b(N�
9���A,
W�{��+~�� ��K��M����#Y2$��J�(�*�#��8��B/,�������%�lR$��R�)%J�J��P�-���#�Q�1�,�L7��������%t|J��d�*z<�E�2X��A	�(�BM�B(m�Q��|��,���BI���M9�4��<
�)
�*��4����M��+,:�A*/���+0�u�Cu�����U�a}����R9D�e�m��fi9��D���k�� l�-���&q��N����4����T����x}W�y'
��{�]��|�-���nX��=\����0�������\U��~)F,��1~���9>v��A�M�I.��������7��!N(�db�����b�]J��y.����Vt���.����V�V��n���'uSz$��k�� M����J�*5���������-<���I:������L���N�\������ ��F����,L��/f��g�$�@�-�<7�5�<"��]��T�tl�Q�M\`
���DJ��M�
�b�����%�9[�79�
.���"��O��2��o2���^��G������@�������=u�
��.����I\�d��8p���~�����[��g������(�^=G=�-��e�p<���I(b
'2�hUBq:8�.X�(&��+����!�S1�B*�*T�b������P�=��}(��5��E�PZ��5��d�+����I��q���	,��2�_1����9�wQ�"B\!��y�At�Dx�F9:�s<��G=g
��������Q4�L�D���(B���$8���\J��C�07��A	�8������TJ�r��t�T^���p�y��@��1�7�<H/��	���6L/*A	a&�tx8�:��%v��Sh����(��&6��^�Q��e9AyJb	@8�:#��W���H��&�������C����I�]�a�J�^t9/P�@�"��� ��]QL����!
<J��;t�
^��D4�M����"'<�xN��Q���IN�Dk����O�*��b�X���*Ah�\p��3	��F	��Tf���
���hRj@M^U(0�C��^���QL�Jl�ZkH!���q~����i_{XSa5@Bj��[J������ri����8l���I��>�!jR�%�X��(�T����T�l�@��Vbe�)�sB���U!�o1�������-�`��@H���f�������ne� ��r�-qb(����
������pN�b����
fV!���k�~�^��Q�`"F��!�D�������� r�E�8�62|�JU���d��*q&v�8p���BY��C,��Ad1E+� �=��b�)-ukRT��b�	��Q�,�}��
qR0�+�:8�
�r� ��C�`7eM�O�s�$.]�`���y8����Mt�l&i��M�9v��A���x*Mv�%}�"��6!>JK}��-��J��,61^DM����r����y��0�r�]���f���z�l�������]�H������Qi��H���$4��Vc�������h�pBw]���|h��T�L��8
��}Z��5���.?�^�c�]���.���b���U-x��r��'���
���,��F��
���^��Ki�A����*��6�=��s��85k`G��d�Z�l��|(�����0�,�C�=A�"����\P8D��K�oA�����dv������k�[���+v
P8�s���f�]���x��P	8t!��5U`��JT�g��7�����E�$C4���"����D�D����	y�e.F�^�\�=�6�s�v|r����E�tpx�b������z��Q6���tw!�}�/b������#�����}b:���^�g}�Y�zk��n�G��E�
;8R�E��"(�y����B�BP����Q8�����*:H/\!Db�0@��'�����i�����oS4�zG���)2���F��7��BF�������2+h�!6��
����j������6a�6�Pp"jo�,�� 0yp���PJlp��0��L4����dA�
����JF/k>l�(pe	���M
o�!z������#H�*�O9@�o)�L1�$����(��PJ��A	�I��	��\�a��L:@pnTj��N`�0����~������.Q)&� ��*z��&��(�B�"q�6."Z0)J�8af�$	�@f��.�
�J�1���HB�A�
�@��#� -R�������*����)�� �p*������o"�Q(�N��B��d����t+)0����$���"X�%��(�g���_��!����:�
J$'R)���������(�F�z�"�����������$M���ff$o�/�$PK��%"/�!��^�(3%#E%$�����*Z%�&�)3rbz�����j�*��!N)��Ng��;$R(�J+�� Q(�O)L� /���0��R(NQ)��(��]�����P�1*���d4�d#��x���%�B�\�
~��b�����Q�.�(�pU4Q(<@+��.+����'�y i��0����/8�,q�P�22?D0O8C'���8R�8�&����`�n�7b`�C)�� x�)p�b���#g��������@:so!>�(��'�s).s�-9d�ms#~i�@m�?�3��L�B}��
��@��v��A;D���6�
&8~�!���zk*&����O��U<�;��)vr ���j`/"81�6K��:smz!;�B�FI�?	hPr�h4H-DB��=��$��C2��oC:�:�� ���H���S>�"��b�
M`=�SG�����
��#E�"6�
=;c�)�%�(��s��(�w�(9j@Id� d |�I��HC21�+!�H��I�bO5����t*d�1�7n$58T�AQ�q8��Q�S���0�b>�����N�)6��V� �Q(�H)�� �<NB��� )>�B%_����0EZ�,d�%'� X+M8�
\�2�*[(�$n�R%Y�dha�@��^��^}��:tT�S�3D��Z��_��O��`�"�zU*� ����S��
4�(n�o�BN��5�"P�JMb
��<��(6t$�"���V�(4o����a���`P�d���H�c1�0�����A�4\���%��]�O9�
�Rr����]�[��DDb�LxVa�B��@U'z�[��P����GJ�*�7����6�v(���\����V���^�����"eIGxd$xa�^���J�����`��q��.�6�kE���;X�o=��Q3j9FF�X��jk��@T8r�vtb���)��hS
��gX�B�j�C1e�$��� r�t���n4@�/uBv�
��4���x�=�@>Hb������)�,I���Cw~�� :�h X�f)��b��k�U(6�uCk
O��u)�w��c��� ��-��+���wsnw9�i�8�7�� q.�wO��@��:Ep(V(x����<�l�@W(~��!.�Ox�;�KE�I�z�; n��X���27`��c<.�#��(�i)�ChMW1x����8xu�b�����X�x
�dTa=�=�<�I7�
��d!����b���I����$~X��p&�\N ���zP��b/i!��
��d�L�
��T ��)���N�L�lV"X(�)���R(&�K�8���o�r)8a�d��M2������������X��_��(�Xt��y���G�b�������qgq6�L���8�Mvw�����b�B���
�ld�d��
6�w��	!:!V�%c@&'������ B�"��@�"�b#e�#�.��9)�@"�� b8iK� W�n�T�n1x!/=@"M�%�v�����cn 'p`�x:t�6T2�.�VC�"���^->�a������������6`�=�%� �t9`d���}����B����`�tG�1 �\���Q���zc��\�����%*e���H���82�y��7��j�8 �I���1���e5��������8�������)��H,����������G��
�}
B�9�d��9A�'�H4a�:��
�`�����:��o�(w)�n �W$_"�������E�1�\Q�����2!�r�%��^Y�C�W�XZv 't`�N�%�����������y8z���R[�3�5�1��*�����\��Z`E��"�{����Nd��{#���|D�(��������	�� x�����.
����<)��0B:*2Z"��|8 2��O�W��-��%8;���r����{1-� ����������\(w���+��{i�Z���� �[�:zD|�"��H2A��"��8�T�!x!J3�������W�P�Qb���#z���� ��$�(���[���XX 'a���A���+��"��ab�`�Tn�L{*��K����9����7$r��g=):���"���	y!�d�B���?]A��V2�tTJ�"(�����
�0jaE�$��{��$�a�"|��q��9����NC��T�:�����
I�0!n�(�TL���e�� 48�����M��4�8b��"������!~!����JJ����=[����:�
��"r�)�x"�z����jo�#r�3�%T�c��� z@R#5R��X�a�nj ���=^(2�(�[��c
����� ��*����o���6?}���)F>��L�`��&6��8�O)�j���>p;\���b�"�*������
Bl�hp"��(b�Y� �`D���#(���u�!
o�$�~(2Q�!��s�B������ji7�����q�<��V�L&!a�"z����!�*���!P_(�M�O`J��N���6�pw�**O[�c��	d����O�����xA�����;�!��xX�����4�pB���M�(�KB:������1 [^|�2&�.1�@�'��b����BSL;-9���4iB^51�R
5���Tc���#��\�z�
6,����=�6��S��}���t������|���8�����>L�b��r��8�^M\����7�����_:9�>}�0j�H{)�����I] �PC���8��!�8�W�����a	9eP�pS�5�<f���8�xm�a�����Bxyp����z@�q|CS{e�_��J]�|^W2�t�A9����&�	DGJ�1$B�`�r\'�^-u�W�x�wQ��Hb�c5%��*��V\.�5��2�-9��t:��c�>�&/����i.�����=(��YR��Q)��!Y�t�d�^�T�D	m��	���^X��	5I�YGa��	�}�����51w�+E�&�A��f����%m�B� QQ�M9��+�m�!B4��o���gQ�Q�A�<�	Q���`�N��K"��-���&5y���ev%���lR���P���NKmUdMRm�`��m��~��N��[b�����n���vf3�r��������I���`�A����W�z�M�����)�u�;P	5����UZC�ZD|�R������A��f�[!�fMg��}��/G��2B��CG�P�f5��pK)B�K%�Tro_�V�IMO��K�R�sM�|��b5���fK�EL�v���5��v��w�<��.X�x��y^���:xC^�l��	U"Y��2�@���. ��+��.��`rdB��Mr%�Pq2Y4��W(=K�.mP�������U���O�KH7tdU����L���KB�7�����l1�7�\�9OU����;H����.���E����W�%�a��7��$��	�Cd!�����n��W���Jp� ��|GA�@K 28�Ns?���/w�E�b�b�b�CI���d�|��(v��h�s��A�6������a��@�w�� 5[��r�;`AX��y�A��
DV�C�z1��
����	:O�,$�	�T�&�l ��R�!D*����P�^��%��E%������ RJHN��lQ��q�1`6���g8���W`��R�������leW��?W�r���A�$fp����/#&/d���j�FY�q0k*�@�����$}�����r�
�]x�,���h@Ux���4 A�$��TM�	��p]A�#�N�lbw�@��3(�����g9�uR� a^KtvJ6d��gCuI�*�������B���B�B�$u Qyd���8t��YWxA	:��+�i��Ry�+`A'-�*i�T��%hIm�S������^��9����f0e��D%B���`n#$���r��|�/p��:kX^��"&����I�-�.)U/�@� ���VO*Y��)8�Z*h�8��B�f���h����s(� s�Rx��������=	b?���/.�E/B��!E�HN���z�04�J/P'���v���(1	�R�����y���h���`-A���
�)�o�q��n=(�[ny��T������8p�������H>��
pD�=z����u/��kA��9��nJze_n�|5�|�V�����MMd�*�B]}1� gY�df%.1]1�
� )����D��2���&b\`���W&0n��8L�7�'��	U��T�/Je�b����"��dO �������lC,����DDAqg�S�+��/��J���u��j��-#>@�
0����h�1B�����oa�r�?
��p ��E��{�>�!�dJd��F�@	q�L�u)�l�&�V���p���*���
�_<�X/X gH���k�)HG4�d
�l���	�v��JP� ���W���"��������nS���L�L�A�,0��5H>s��Z)}@�����e���P������p�lF0&{��~�/r�O
�ki���)^�J��Z��������������B�Mt���^M��7N�0���&���f������'5��b�A��(�`9`0gL�$T�;x��T���A��3��)��@����������iz�T]lu�)�Y�x��|w��vDg8<pJD�I}�@N�c�[��<G�")��,_�Aw
�>A0f��b�r��T�!_�=k��8�Nr�)�HX[�W��$�ety�Tq#�=���hz�>���S$�T�������3���6K/�9�:��	�(�
�]zOsEkp��Fv�>�Vd1w��K��=���4W.�~���R��y[�g@	��8^@X�l'��	^ &��{��L2q�05SVUZ q'� )r���~�V.�p�qw�4'���}0��U�z�P{�[Q�	7��Q"�pl1�<��@Xr_t�x�q`ig; a�#��w�^mq�rP�
0n�{8-8@
	A~aq{�'��&u���@9�TCrcyX�^0)/Q�x�����'E�#���v���k��'A�Vw'q:��,p�`�cq�?�l���sA���K3��r��W����x5Aqu�TU�GC}\K�i�$p����DB"v�0r1��1F��mW�4�x.An����jan�q@	�5W������`q������Y��!��Q@J�"I��p=wt��E����8U�.�<VM�#�f*f�~�$y��|���5�i�qa�[�(��r~|q�@�!RZd��x��g.�6E�zdSJ���ki�,��fQzu4"�����@	QbiX�W�0%c���)���)�����z������_��,��^����}�l�k	�������sO�|Jm2Q�F�:~Qjw�]��UR�w����a�S��v-�����A���Cg��������:\�d�S:u?@�|q�IQ���4�9D�!�p2/�����1),WuM��I�{�q���I���`�mqs�J/'��2��^��)�>�"0���.QB��#pf�XK�.e0)��+��~�e��F��`��7�%�,��=����~�OP!0IpG���	#p!�e�pJ0ey(*T�0~�K*I!,j���+�-�%PI����S3��\*�I`>�� p8;��i�jk
����)
g��*�G��O�M�x��*�F�� j�.�����J���j���������
�
��	����@
	���&��]
���.N�U? �=BK��ZJ@/*�
1B���{@�!9�|@��J��J�������Ip���"c�$�Q*p d���1@�$0��:�DP�d���Ep {���q�k!��d���j��J���!0k�'�HI���{0�.����P�9���S����3�(K�@j�>+�CK�Ek�D���J7��z���Q�s���$��\����[���I?w}�������+�d ? J�J{���
[�H`�|�/�#��}�o�t��x�c{��~Q9A��E0�{ *�?P��J2}��J#+�����v�a�}[�'+��f��JA�@��G��Hpm��)����95!�mDP�
��K�c@�1�
���������U���A�^�#r�U!	]�����
�R�M1��
�V�o02��@<�����
��	s�o 	����SPV0��0�(���	@���kd�� �+L�p������(����,�oP��'�l	���@���u���4<+g�<Lu�S �W��X�
|�Y�:#L��:S�?�q	������ �7��q,�sL�u�����I�@`�	�maQs����T������,����	��X�	�1��}�����a'����Y����V`P�PZ[A���f�����$�#j3�-�uBq����#�\>���!!)*��"B�q�����D*��5��KDFE!)���INH�Wm�XOJ����pK�
	���	�����vq�S!�������JR��a�E��p�S��QvQ	oE|Q��&�fI�m����{��|[u����sX���G�9�� -��tf,����-AN��k[	�e��5�[��$�q���7�?e�"���rx��ai���&�3��.vC��'�P��e�T�|�a.��^���������u��R�����2D	:������Tp�|�b��LS��	4�xr�5P^Pl�`UP&}('�_��s�Q� ���J��AUp^����R�Fg��	4��.(���l�	�)!�D�����(-zF�c]�ruA�+�I�,��J]����C}-���~��%bp��	��!	A	1�x�fv�.�TKQ�!d[^�[��N��yM��
���x.}oM�h��Q�o7�
����$/`���)[�����bm�q�Mc�	��'��2�aB��)�X��)
��1��7	�#R�P	�����#�  �4��$�p�����.���	"N�q��SGr�]`���]���.!IZ�*)1�u�{�wP	n��n��C��H�1Eq�]g�PY�Ak�R�/`o�B���	����x'�]@��>�AfHK�!�`����pa��9�sS.����2�ZD �'����9V�K����$xf�.P��dq�E�|��������y_�{SqI6��-q��i�?�R��.W�Y_R!�������O2!���5R�RL
PM�c���]������Q�&�^�[���������T�
+#\�z.!� WXpbF\^d��n3��!d�S8��v�]�
r�2��+����^>P�T{R�P/���U��.&���W��RH�D0VC�cC���k�]@r�}�v��9��q/3�A�x��9�}"�]G/#
^	�N���.�T@������P�2���t'�q���z@�������'�Rm�&N\��H�S^��3.�.�=�:T��on@�] F�E��5�AP�����2����|�(q������m#�������7N�V�H��]� A�\	��I�����"O�$X��A�	.d���C�|��� �RA����G�!=�@���00�d���K�1e��YS&N�4�4�J �6�%*s���I�������;Wf���X//R1��C�L/e�����I;������)u��E�P{��������D3�K3�ZL�RY
��e$��]j��tC^X��(f�3��>�i���%�XN��p���^��t-�/�����|Q�NZ�r/�]��	��i�}�J:���-=TbHg�v�"'L��[L����4��B���.�1�n�ay��^��=������k)<������xA*��0K�MP2:���5� D(@�N�L ��C ��!W8tI��f�C�8h��#^<��E
E�)� �4�H$	�%I&�tRC�L
r���������i8<�%8������N����
3������d��`���b\���z�&,���+b��Y���L��LH4�,CHFYj���8�B������d�*��(�>
R���<�2�R��O�����Jd�IA�b����b��:);A,������8`s��^��I���^Du��r$���4�m&b���!H��@������u�����3(���M:��}�
r1�4��U��L��^jX�^�e������q�_ZO!Mh�5p�1�Mbp�T����-�< 089Xv�;�xA�8"K�%����\�4����}IR�*i������:��t��� Y69�C�����8�Y�|����4X�����{n���L������(O���Q��:�g�4�&���6�H��"S�m7/�)/Y�f���
�$/��"���S��a�X!YT��d���iO�w���4X�:�b��������g2a��*���b6!]�V{�6�I8QM�����*b�������6h��zy9��*�D��*���rs]����:
l
"��	�dr���{29bp�x���L&Q���a�%*aYTA+/��,�E�s=�C9�$(���%��+V8^�zEJL�p�2q��`:pA���e�P�LP�,�
��2��a�	��b���41&gH`A�VF4&P
p�Ap�� �s�cC��"�N�.i�+���	�y{�_"�
����	�p���a��X K� ���v��#����Nw]��J4�0���$x*�	8`�w��B���b02p`,�j���x1A8!�B�����#��	(
b8S�+!=LJz6��A&(��Rv�5���?�H�Kx��I��=
N+Qb6��D3Q�r�D�\�rd�����I�P)wz�a4��& �~���D����m�
����P��t�#_q����|1I������	;�a�(��L^x�Z:~��Jj051�($�+�G 7�����e��t���� (	���f5�zP�$�2r�\��j:��R��3�QW��P }�	b�*,*��V"��(�JDL&�c)�i�
hu�4%�!��1�%L�Y�����nc;�b�� �&9UH6g�<)h������$�Rj`��B
g����|�qH,��
�`EA}^C���<���X�c��0v!U
%������-1AaBF�@�%qQ"'�R	/\AX�G!���IR�R��Wt�xA+��$It��I5T��x��:
H[������0L��`���#q��l����q�Z%�GPU,� @�tQb�X!U[��~*��$�
�����e$�I)�2���R�%H�tN��g�����U	��}��^������D��g�w�%����@�%��ICN��`p�������(�I�]�����&�a�+���1d���&X�D:�q2M\v�N��|�P2��b#����1d!]���p�
�0S�pL)�����*��T�O3�uC����P+1(�H��
�@aE�V��${(�5���fM��t��U��r8&[5F�A������8b�� o,	n�����,.��}�{�� pQ�H?�d�S��������
�r�:��
�v��Y2D�F���q#5��I���ha�PL,P�!����?LX�%���L}����J��k���#t8�����$�6z�0�w��V/4�	�������S���_q�������w!��x�:"��D$��Opk��z�;���C����%}��s'�]��NCn_P�����eu��IR/�]�r��1	2Hx!L4��r�x�Y��b���
5��@bZ��/��F��!��4��	�AU=Q|�3���.��c����%�D:��T�5����	k��;�7
��AL���&W���!�	����B��[������xq�@��c���+�K����M�8��M���;��{����z8H=W����>;��	��u�F��q"�����
,�I(
��I)���vs��	�	���=	%Y8���~i��H����K"��	��h�^x����= 2��/���q����
l=#9.��� !?�C	W��.��I;�s	0h���K���:.����>h��!���1=(	h�xDLD��������2#�8���s��)6���X��8�;��PK?ys��[����(E���{�>l�4�������<�(��p^a�Z�`|�h�	$��3%er'�5	�2�
��,
I�B�8���X�ld�?\�_�.����.�9�8:�6�F�)��07$�����F�����������+0��.��Lh8n	O�3&���2����\:(��K�JB
y��������2��������y\�x����	#�M�*C�r���	��D������0���*��J����A����EPW��u��Y�D�(H���;��j�	N�E4������AilM�%l:�Q����7�,�����d��4J����Mx�;i8�IE����XK�`D'���h~���W`2'�K�G� ?
�C�@����8�z���@��������8����+p����t%"�H=���:$��>��D�w�	Q��T��+�*N����1�	,�:HK#���������tY��8	
��{��;/�D$����\�u\	��0����*rz	����J������(��P
m��\8Y�0�P8��s~|1q9X�*9�����@���|�]���(��<!��[
_��J�	��T��8��Mx����D������	�HF�(��P��K�sOq���E�{B�`I��0�8�[9���J���0��NS�QMpQ�����JM�����	���		9KI�l�
RK����u�����,%����qe�L�cl����$��3����,�R�����:��<:��(���L��8�`m��H�
m�(	=h�_(	XVj-=��!L
��3#�$�;�9���C���;U+i-8�
����J��X��p���Lp�]	��S��3��%
��l�W=���V�(���;-m	.�4�`���7�F�p���&/�����������"l;x�C� �����$���C�r��$�/z�i�4R�h:������%,�J��*m�ub�����U�PK�R��W1�������%]�T-�pJ����X��JP�		�t	B����0����������[����#�{t����):�������8��y��*���D2r;�7;K���r��8�X@+9O���;��:���w�	��d��d	44���I�x�G�S�`�����y��h�c87@-�N�+���	�U���;�*�`���8�=<<9���b����H�8�}`�?c9���P�dQ����������_�p�������N$�2J�X��(�#qS�c{�4��"
�JX��a3'�����[F#����S���`^���DuE�C���x�vM$�l]���#�����J����]+�}BvN;3%�!��H����R,�������_�X������F�-
��6���^�LS���YK�
�� ����������p��I�}�9�xr����*HQ�����Z��(�=v�3N�5�9�Z��Y�5����[��������YP�������E��\	6�K�������*�	�8��
���#>f�1��X�>�,��Sfj�8��	:���*�
'�(��1W�'8P
�;���i���b�A4i)?]+����!N�1�����������������6��M���!f���Am��cW���4�b.(5��"9�I�.���������d��1���bG��������8�Y�C
/n�j�����c��2�t�|��
`�>�����u	O&�!�	�	/=�����j��4e���J���f�d�'�(�xf���*�3�c���0��j�L.���e��Y�b�����BW#�F��b���_�����}Z��b��Yv�����l�)�M�	�����!ms]#���;�Or��J��N
S"��(����aN�����M#�^�'���)�ij��
���;�������{����%��j��n�t���S
����&�j��W� �����xK��B9���o�*k�N����*0�� p�	p����3��o�	�p���r������r��If��k����(��3����J(`/��X�}��.����Z�kyK�&^��+o��W��&��+�?W`]����~������q$R��B����c�P����?�Zh-���B����8�4Y�2ZH&�q��*����=��5G�> ��Ce��y�b0M�bp�P8=<!;V�pH�_p�t0�i�t�=��b	�J3q�� ��Z(��G��A���������P�������GIG�����)�Wn�^���`���@����c��`�zY`���u��L�r��A0�� ��1+l����pI�}j�S����RTu���N��$o�:��8\��&v�M���1�7��.�0)
��c�������{xN�*��x$�����"8�_V�1Q�>�O���n'
-��;��.��0,�y�� �d�Mu	�oQ�t�h����/��m�Xz��Z�j��W'�J�8����	�6�}	��m����3��8���Vg��5_�����!��q�n�F_�] L��-zy�`�0�����q�^X�2��8�������<�_��JHg��?*����8�_����^q��G�������S��ph��}}ju���5
(��ul�~�4Y�V&a���
�v
��w1�� _N��C����g�^��.�~�/�z�n��Y��g��@���'t\&�f����F����w	��J�.���_�� p�@9�����@�B�(q"��/<�p#�
�0�y�R�!��L�b�;��Y�����&z�)����&��T)G��b�^��ti18=ub�R�K:{��x�)������U��8^�hZ���;����+7���X�R��w/��P��+x0���#^J�/��:5�:�����7W������P�k�x�,�p\��8vw4k�10^��"d����P���;	^�,�t�8��<Q��E;�����7��E�u�'������b��Sv��=�+��Dl�r���1y��h�~DNOj�
���Ky��K~5mQyPUR�+��7�.��N1�Q!a���G�r�V�a"��������X/�X��� %5�v��>2f��CY��Djb��"6��,yP����s/��`1\@G1��\I�tFzJ2��E��uBH.���/x���5�9QP����P� \DQd�X
���T�"K/�md]/q���/���,���xC�Xx�IL4�d�D����H�(YEL�`��@]u���v�D
��e�����7�R�F���)TyF�$T�&�E�i���u�ea��$�QqC�e��X1��n���nH�`a'�Q��I���\A���^�>}�sxq��:�q�&TA%�R�l�a1d�$ ���W<��A/��E�B%CHS�\�[�V��U��Wgr��D����:��KUB����XL�\�T�!DA�Q&,t�� ���NP��_���d�1I����X�����1���\�\��5������m���'�!��]t�!B<Cu�E��Q)1d���-�+s����S�A:��\����nc"������g���{M���A�;�_&;:��.�v8kQ��8C���>Y�*c�k-��tF��}-ZM4P�rM.w%=G�"D
XL@�B�o�6D�:�o/��@<��L�!�p]%\ ���<����V��.�LT�1\��F�iU�\H�>C�'Lh���!<�&"5�1���&�7hl%���	4��*4�w�
�s��� ��S�K��D����Z�"!����*;���\�5(=�
������I�H/�G�����YW����������$&��*��$�����f����(���nP�A(!h�^�bD.�+F�~}H������g��xDb4�x�����;�|�(��F��b:ff^A�N�
�E�Lbq�	�\b�g�D~y� ��T�����D\QHb�G�KgWnI��u�($:0Cv��1Z���H�O�3�^��U���I�v�Z�:�	Js ��1�@�X��D�"���w��7��22���)�
�R���PDP���t#��H/4 TV�/���*#A��y�d��@�&b0�����ER:�sU�o�jD�!����"^m���Q������h�9�����mT2TU���H�p)�&��e�u@��J�:���N���3	[$����}1�"� ���d����P���kExQ
��1|%�D*qx�C
r�&�&��(�+�%�Dl�&
4�A���������qp(��v$��H� "�Yb@��)G6��B���K%X�	�RK"��Dn%rT��#�dH|-"Z�Tw"��I��j
��I�������eH�DbU�p��<e����5xHf��!-�I�0�%XY� ��e���b4�����FP�1\H����!|@f{����Vsx���"Oe
L	����|�l�Q���r\���A#�4-���mk����3z�{aX��l"��/\�PT=�A�@,��n,?H/fZSK�(�#��B
\P=#�o�1)2]��2�����d���1g�tL>����{5��JdAO
�7��u	W�X�8��������I���A6	��W� �L�rM��9P!4������Z�+�(@������9�����*u jE���~�6�>�9�!�x�3n]I�I�[���*�������h�E���F4p�8��2xnx�@JS�� ���@�b�U'����7��������
q:P�4���A�0���"�n�� R	>���4X�o��@����e��;�P�wD�L���[g��u'�$s�9G���c��
.x�>���k���^�����p��	 M��0F��
z=D=>��pc ��G��[�X�T��]�+J��E����qD��`��ly5o��sD�r�^0v��I�YB���	��7�$������t�S$�p�Z�^��<� f�H��B�bl*~�H�u�v�<B���1H�D�D&���&_�6Rw�;�*��c��E�������W����RM��X�'l����3��Y���]�_����d��K�@�B0@f�����v�T��t��R�E�KT���E��U����+�A��&hBt�q����k��`LYD%���WM`������U�M�X�
�����a��=�F8B�Gh��-E�qX�!�b��4�NY��D	%�nDr���1�T��JTB
<Ep�
�� ��\)��^�E��+��@D�3
��|�Ft�J������^�Eba0F^��E�Af-@D�[@�d�+�K(2�C��Ki��AD��a��Z��JD�W���|U��E%�TAX��K��DH6
�r�u�!B/2�DL��l@��!O����U�.����X"k���Q9��8�&_M\ad_�A#\T_ab�@]���
D
�$�">�B
	XE�dc_?^�'�����Wl�&:���aK�]$S�ILV�dVe!�!�ND*d����`ch@�q�|Z8n�;�H���� 
dW0SR�K�=B�m*��K��5:'��@D�D2�VH<`�udMDXH�;�D&�!A�Y��Q��^��}��ah�F4�Xp�Ua����XDXdPb%P�	�@
LeD�"A�dPx#A��[�\���j�H�W2�fR�R0�`�$��D�@�} Dd��f�t����)�p�D�M/`�UI�Q�%���6�a��N��A�R1���h�
Jp�K�F^D,�V^��V��_#S�B
1k��{�����UD�d%}�9B�`�����d�\�ehE��@dJ�&�� A��R%9����}�VW�%j���,#�D�X0��hd�m"�d�D�O�����q^'T�&t�I@�@�O�,�p<ND!���&���U�@�1$Dl�hz����R/��@t�/��2jW1L���_1��N�fW�B%P���$}��&L%�_�C\SH�����b�Z�Ev*b�E������L�]���h�5����1�E:���|�@��H����$Th��(uU*W�e��@f�B�B�E�ee�-�j���qD�ud��Z�	�W!qP�F��1��\����q�N�V%��0�zA��G�G>e��D�\�d�	��a�HMps,��R'��F��&�������Z�_�W%tA
P��Q�I��G��+����9,�H���B�@���&���Eq�jg�j���bS���zW|%G6����lP^��@.��A�d�
����,���[@������<E(���H'�5��NvrD-mB0�@���?6G�h������X
�p/�A�A��U�e�"��e�DP~�&������v���i2"�,Xd���D\�Fl+��Bav�<k��N�(�D,p��F�b�-c���b�z�V�qj6��I�
D����T�T~������.�� )@f
@D�dfM&����N��+xA��nm�@&�SbAg<��0%~�����"��&B�B%l�.��K�&�8S/t���@�t�@���
�BD��%�S^��&\��(
b��� /,��}hqC�f�\���E�F�D��nU�L��/�/S(/�������&�L������p����d�AE
��nm'��D�V*�������
qb���GDLm������@���x�N8
������y��E���Eo�
�o�����)��(I�+jLs�TrB�����	F�a���H�@�c�2D\G/T�K��X���h�d*F|*X���xD[�,���E�A�h�Y���&� �!;��u����Ya��.#`��
�&�T(u��b�N#>�r�>qJq������4C�!,�a���J�@���*���Dl1x���gD�U}�9�-+ ������A�A�T�h�l������J�#�!b(/����F�l�����8�18E\�d�g��?��,�@��d2,4H���E�����s_�	T��C%����Egf��H=+�/�
E2F���2�`�5����j*4��5@*@D�x�R�;#�6wD|-'��@(��b��(�����a�Y_�X$���Z�F��amJ�_QA}X�p�5.ZD	�+E����i������L�r����$�E2�.�V�k��@\����G�om�T�2Wt�D���n���^�.��%`A��dw���5Bh�\�AL��O��O�DJr/?�1�Wk?1c�nl���)HD4@�b�a�%Wv�j��u�h�[V��**�HdqiKD���Q�D:k"��G���n_A����6m�6o"�u7dj��9#)������YD��Pc8�u�>oxW��ZN�E%N�
�X����1h��,���E������@c/�Ax�	\�����'��������[OgK�,�#7k�v[��R���w;�Vb6���%6�,�?��fg�����@Ly�l:�D�b�p�����+cK�F�#J����G)^x��9���4����+m@���@,&W�v=�d�D������/E���~E0|���D���%�3E$���1��wr��,���'�*��}\��+�RS�p��
�,kt���78��c���k�^���t�M�����@Ot�
��v}�u���<D��Q�����t��q�,��\�	�
�.�lL�k����&p���f{$O�v{�k��G9����J�%F���o��q7s�x�F�{]v�vb(�q"T���^�)C��9�����W��k2E�2y<�2E,�	�pa�������YE0�`T[�s4.+o�����{�Xdz��[i���o�Li�������PV����
�<|�v�Xi9u&c�����r5�f��	(�D��H���J�=Gd>{1��Hu|�&>f(`�#�n����K�W�!�\dq���DT+;���Q��`�S����//��@�����\x:�M������)h�
)�H��w�t�j.{���>�n����?�_���kH$��+jl�B�����V��v@~m�m�yE�(�����$�k��j:[A�}�<�D1�	��CB�&t��B�)=�X����/�]�h�W�L��hQV���:htY���^\N�sS��^����KL�+9����O�.:-�I)�N�z�3Ra����&�����MU�n��p� �1~�$�vh��U���5*��r�zxW��'~vPY�C�?3�J�rA�:58v������Du%0���I�6}uj�����v���|��}wn�zd�����$����!���K�#������	M���!��/�^�����8]�h�u����8����GY�k�+W)]��<h�S(W��(�#���&���JZR(&{���\�b�Md��::����6��#J��$��:���41�'�����S����"/��9��N��9�dT��b��	��8	Q����C-���iD/�L):V�H=�z���4����kKK�d����6��N���6�:����,�>4�s (u2��tB��F}R���"�J�:�L5���N9-��PE�+��4�Q C5bq3:2��!W��j������b�&/J�h�'-U�,�)�	r�'
�u�L��,�4�,���)jk�������*�������$(� ����9R#�d���
<�b��jr<�����8��G�:(�W��:�;j0��*�������D�C��d�$�q���b��Mh���bh�j�Gey��bO+X=�,���
���t?J����E:��������5u:���
�9���O���l-��"��1������}��0c.F��*���.<���+�}h���s�3��c�t�hX��s���^T,���rE��:�X�v"&o"�L��6��<�"�x!�Y:����Fk�>����$�]b� �\rA�"?���*�F�����O����(��9�br�V�r��k�3Rk���
;�5M}�L�2*�������8�c�8�O��O��5�)�}	��e>��A  SJ��VA^�*xY+�
(@��p��mSl��
!s��e�8)�b2��"�s~�c3���M��	���9s���J��h�dZJ*N1��h�n�K Yx����"��������,J������Dg�DT�6�;@�X��"���z�����/�#���"�=�(�Q�k��\��G��&&��P\���h��T�cH��=K2�UB�F�FT���%85DB*w?y�EHk
����YL����tT2%�� ���2�Z(��a�K�X�*�����S7�#�{���.�&�N����E��?����� �t	�vT��1-*4(��.!�����>"F2&&���Dk".De���*1E��0!�/8����t#u`B8�YP����EfB�6��J	��$93��K���	j����D�
�QA��pb���K6@��%�t527����&�#/EJ�x�[�P����M��Ul]� \���L��A�@W�������	6a��ls"�SS�W�XD�1�Nhp�*�J&��QO���6*2x�&�P��1t �]
E�G�&&b��dp����B�(^� <�t ���T���VK��-K�U�x4T������D�Q=���[GIW!�u�#'"�Q��+,Qh�V���*e�MG5YX�k���@���W�z��}4p���Fv�Cy������^�@�k~������9���W'��@@��lb�@U.�"�xA���E;��!0�.��0��+]Y�����8!s]������"�80A�K����g��z�R����~i���rY�X�!����[Yp�����_������RE���!��b��A5�0�|��XE�:� �8�~~����({�dl�-pa�D��5!+$�_��NL���6����-��J��Q���	�(�@8��:�V������A�t�Ad
�n���B�N�������Y#�Q�9���6Q]���$�@/U=�Z�dQr��[��hh�6��H�=�P��CI�IH1��p��2��'"*$�B�C�5P2?�th^
����j���0�O�
)f��k��@�#�g*�3�}T�n�jd���N0n�h�?��@\!�.x]�!�F���B��mw���vi�H��h��+�8���P��B:�fe�
/E�zi�	��z$S;]/�*�j����i��������;u��q�x����B��a�5{��g����Rb�?����u�]Rm�j���=�T�`b��������)e{� ���JH�.P�D�	�d�dp�'7��g��k�Xkf�-*��HQx�����6[��.��G:�I�^(���3T��m��5�5W\\�n���~�������ED����E�/M2�������/�|�:`�O P�'Hn?���� pA�&�n�g":�An�����c��$���u�B�����!�0���6����������&B��h� 4P#�� �@{��Bj��N@f0!��h��p�G(�Hd�l�B��&t�7��'�/TdA6���e���@>POdadf��J]��������V-����0!v�mR�$N-@��N@�LI�q�K)j���:2p Sx &�
�
��������6��b�\�N�\O�@pb@����@��+
b�4��b������t�z���Pk����Vc��&6�
���@�� �� (a�������e��N��(�����H$��!0�&�/�	��4 )���8\kT�+t�0/��������� R#^�7�7�����M��� �1(AR��@"#�G9P�����8J�q!elj�.� S��,���p�����8��$*6A�������	"���!�n#8a'����\����>b)!�V1	E�*�p�����N����(I�����!V *�*���/�>�2���Q^���������{�R��j9�2�k�a�#���	S���KK�M#d$��i$�4R�#$���2����p�+b�2�**� 7����t�"t!�@R�7+�
�	��"8G�Cs�qh��b��^�4A�`���:� ��$����\���`������82�76�)5��l��bb��K�&����"�Q)�S��h���;����h�\����r"L��&}�dO��8�
L�X�!0�k��t�O�

#�*��Q^Lp2���=X�5[���TK��&�/9mR���)v@S ���r�4%��m �9�(V.4L��4b*��G:�
H��M �q:�.x���#��$+Y�$$
XR��L���2"�;��:���"
:�4�2�z��8D��C��&� ��d!�~�<+%��F9��>�"\tT��+d�$��*OI���D�������L��.&�qO�I1�QC)tUKU���U(�SS��BhS@Y�f��t��� �:`3�G |�3�x�N�`u��"��g��,z,��M##��N\ t�P�z��M�W��,4A��ZU D�fTw�|D#���lGj��Tsm���$x��@b�Q�+&PUT����
��d���
N6�6C������.B�#��!�K
9��XT6&�<V7�B/�V ~�P(S"�i��Z�u�f���=����������@'�&aTs/^��X���V�*�'f������L�M6H���h�b9�"d�mOb��b4V2��f�+*h��j�����`�ubf	eCEk��B�6�PH�&��eM#p�����
` hBO����t��o'75��w� ��6,x=�[kB��my��+��"�C������i�"pvX�W�`=;�
zwF5`�2!6��H�.�T#�ko�	��41�w)"�@"6�
�".��H=�W �?�7y�,�3`-K�.mw :�R>����J�I�.1����N�,f� �kh9���#�
�t�c#�^�!h#"�"|��d�8R�K�lw��s�:��n�u8�*bAO�TXW�"��B	�J�^��
�`��7�(&�\��|�*TU��#'"�6#�/�!����B�V9��4G5|�"O6��:�8���#C��V��/1�w�W�'*�����oV�RNpA����u�:��&o����y;����]���Y'x�![9�ZjE�.�K�Cw	"�p54��6�62�7TK�x�V6�	�.�� G(p�7!<�Z�U'61,8iO5��&�'��Y�h�S6�X!���I��	b�)�Qc]�@*��.�����R�8LjMn�E=4�=�B�Ny"rkL�����B��L)X �8s�	5"8�4z�3~��7xa�2zrY#�)�,b��8��=B�����,y�%Scuw� (�R�O���:2^u���\/����"�	�������������}�j�
���B�LF���fNG	wZ'^��x��#4���n������#�0�p��A4�����=)��CNg�
���N^��f2|O�����A4�O=���Da{���-(�
��
;2za2�����1*@z���lzaO��Y(�s�7z�w\�����V�a��	b������[6�b�dXv p�B���e�%7�9x��`����w�ZD(�N��EPz�)H��Oe�~��"�,L(^�#{��#FYs�������/t�4\���<xa�k�������\��w��`H�(�(�b���G�d�
��F��2���uSc&����C��bT�����~CD7���A4"c�L����y���
����;�4�
��$/t�jI!�5Sp@ ���5��/B�B��:!����w�u}������������������#��{S��&��4�|�!���Y(�y�����C��� ��W�a��iy�0*5��g;�E��RP�B��WU]��%��,�%�)���/��2���'��\S���	y��4�:�b�6���B����
T����YNt�5�|�:2����3S)�k�$��"�6!��/a{3��'N4����f:��#��Sw����W)\��|����1�yQx4�{��-B��h������,��=���&��|Aq�yJ=����<�M��b-7��'B��� �\���vt �9��n�%N2��y_�oB��O�kb�lD=$�&a��6�\�J�0Y����a�e�%
x��Wm��B��:*N3��b�U4 T^(������Z���a����_��;�,3���bd��2EJ���]@S��I��s6j�:v�<��}���Z|A
(�E+f�8��!Fm:�n�u}����������v\y��"Ld�Se������;��.\�������
d�T^�'�}���G�:W9!�b@]����r�*���m�!C����	 8�*F����*\�����#&���D�r8`���K���������(S�\���K�T6���Q���8���1N��@q.@�.�z���0x�(CA�J�
T��
��p���+q�y���;]7r8Vl,^&�=�%���6���;3C�J|e�����	�U��C#F�;��:q��c��x���`��TC%E�.%�������X�����61�8�7�q��M}����/�������^����wCY]<d�cR��<�\�%r���%�t=�RcW, e��I�:��Q+I��!Nd��T��+��I�%������]�5(�^VHh!K�5-%�R1���$�s�weUP�a�Agi��UW@���ii��\��"_&x�cZ4��^�������]�Dxp2Q]3uP�Jg����`D�][J�NOntq�qUlJE�]�It�UA$�]�-��]0.�K;Qa�����P<q��u!D��a�K�d�2��AG�hJ�u��I���}~�jA������������kBg����R�5)���@�
yb�D5@���1jW�E�Z��U����eB�Q6dZp���;����������H����
�B�pTB�I����B�)}�:�8�v���B	%�"�UW�T/���cpiQ��W��vy��dW���$�������gi��P���t����zgZ���&���]a���F7�th�vE��.��(��@���h�.D)���+�eW3�@V��nC���BCf�;��h�M��	
��U5j�d^�*���T����Cp��ih�]��Cu_�0B�[Q�]]NP�W)�'_k9DrW^��rW���K��T���u�R�i�l8������������3D�U0rREtP��#$l��E@T��(�Q`/�A
w���!MdC~vC���'�A�1�����{��U���@���x�:,����F�=l`aV���h���*-h;O%���5$c����v����Q#�1���$svy_C����'\��$v9�as9F�p��v%p+\Z&����*o+���HE�M�F��N���p��9�Qz���Da������T����
0�,�����\��G;�^`x!�D�4��^DmZ4�@�����V}��$S�	/\�wH�UB�>0�]`�T��.�H$�~�O/�p�0�$w��BX9��"���/
BC�DC�J	����RA���t���0�R;���S��IWL��f�
j�Nb�*�����z�LL`��D���"<o��8���"���}���L�U�f�$H
�UQ?�o&.��1�����f
��#1`�G���PU<`�+����E���V�P&��Lb9�J���.�A�h�R�^p��1��bg���#^�u����,@�$7�H� ��+x�/���������w��L�%��� ���3t�p�]��B�YY\6ME�Dy��Ous'wm%�`Pa�<�����
���$R'M\�N���`�D2�`�+�D���P�]�(}`��	l��X{`W�hG��Z��jB(�UTPTP����^�b�����7W��^�A`H��A�w�lK!6��������+@Z��l'��,ip��!�`2�5c��B�<��I�Lr��X!��� 2���� ���A(����3���G������&��,DVB��*M�����U���N"��N�.�%���.t<	r��	�&33p�g
b`M6q/ZprJ �!> �k��R-;
f�e/��$�\���	lb���q!q���@T�$�T=
bK��:o�Q��H�#���N'9��2_	�ix�
��A�,�T8���"�~��1�$�"M�`!a��B�����4y5Cz�������f��R_� YV����L�k�*�l��	K$�6�dj�����-����]�L���T�)��p�C��N�-�c�(����^��2�� ���%#��|KB��J��������h,����!�A�L���:2�$��J�t��X�@�_$�=�XD\&�K��syes�Xt�.p����2��*����4��r����
���1�M�(�q�J�F�S����T#�Ln��:��
b2��.�V�D���-�!�������.�����|�#\G�q4�b��

�[���������yZ6uY�U�G[��vrS�.���]
��r��D%O�F�����2���2u�I�#��76�0W�&nH/������3�sC�?���&�B�2�`�� ���C������2��r��8�i�N2���vp��R������_%���;�b���9���`��@��U�@^��u|�yqu;�p�!������L���P��`�d/�'�va5d�������w!��/����c�B�7v�2�d74�Lq�fh��-�1ra�r��kq���t���\{[ 8�bM��4�X4���6v�za�{A*,�7�&q@|P1� �'�~�XHE�<���;p�'�	��+h�!v{!k1Bw!IQ�p`d	�� f�w2q�-&������f�&WPX0	������1G��7.�u�tt�C&���'�f�N.����8EgvA|<!~���pX/�H�	ppW�Gc(r��	����� �4�2a���[D"�P	�pLwQl��KF���4��~�*�@	�0~$�����iT��(�Az�:�=�+�x�	�.���r�	�����
�t�l|�x'��P	�x4y��a��G>{��%�����	fE?g��	aT�`^p�u��2�Ryt@tRr���|'Fy�x�p\���4�+�aviabM����3A\R	�3x`��4#�����(������(�([h��Y��8�<@
/!�)H���.�L����x�ww2& t�Y~����;�������y�g����m�i��t�f{�fqw�T�U9���|���Q�	B��,a�3a��AX	x��q��q{��S�IZ�s#i!�~�������j1��M��Y��8
�w�*�@���H~Cb��qvaG�a�)������l��TA�{�#�u�p��B����q]z��`��]X�jA0 �lf������L� .�>��v�T.�?9�]oyM����!8���r��g)��q�� @��d5.z<p:g�I���@p@�i�v�:����@z#�p�~�G�&���'��'(BzpZ�1	W���[�'I���{%i4Ai1�2AO.�{�(�)��H�.���22g���22�9bo��m:���!�]Q�.�	#��	�	>V?8��/����kJLT�@Pl����F�_h)�����<�w�D9��	����������C��|�=�`w`�%�e�����35�2���	q�g&�~n8��]�� �H�1d��`� 5p/�Me�2��&��W���2Q	��r�q��B����!�2����B��A��5�U�(<'`��Y z�q���+Shw��	Q�:�4�v�BX+������T:�p����s���`9��c1����{��t���l�@�Y�0f`'�o���������	{t �y�\5i���w��tl��[5z����p,��s��gpSW��p���@U0�w�Y\����/0���rYw�t��^0�&wS>�q�v��t�����X������q�����E&�w�
�E2��+��R6�M�/�B��Qao���kZ���q���+7�7�5k������<P�����!�c��7�	��Kk�P~�'��E��`U`K��w��A�HW	;X����Pt���3qWm���A�5�a-���)(�v`�$#��I��geQ��AZ\|�	qy��8��E�pP�����A}���	���	���`��b�9��*���{a���	 6��Nq��mC��V�
����.��
�p���qT0V����S���WwO���
/�&�w�3'��$Lv��M�FVct^pp@��-��S{!*���T�L�>*1�L?]@�%a��U��"F&�@��c���`���$�����d	nN��/����#��,1���vw
�nwG$�m�E:�=�� ���:�������*
f�����S������zU�7$�J������D[����)�#IRbm�fK�U����
�f�0	���d��	��4u;��?�O�kI�������5pYy���p�4�m����������e���Epn��m����> ���h�4�����;ui���1K�t�� 6��j�mfm��-������c\�@�	���Q����8�C{�v��%��2A�������
���*s�H"y�t��)�{�v��
�	�{�Tp��eDj�����m�"
}v�}�Q�!)"������B<`w�hf����`����Z�:����v|�CL�-A	l<�	�����<����L�&A	/��P����A������!�0�Q�<��5�)���*������w8|�$Z[X!��z��A��� 8}�2x�w�J'p�*��tS~��~v@.���Wv�ID(L���LA�e�C�q������+�S�?�����F�I��Wi�7�
pP/PEV���X��]\9�� \�2���	#�Z�P�xu35���vS�!�Pa/�0s:p��	�����g��iNd�^>�
2�\[�Y�/�	�|���f.���RVcA�q�L[o�Zw�/c�4�������	�V}��;{�!	�zW�
a�mf�
����r�~��N����t���>k]2�d.7u�&���k�@1�������L$���1*	���	4�Gf�vA\3_� s],I��
��}^����G.����-]�D���Q�	���D5��@
��r��*c������� ���2�)q�"�+����<������ODq��a�m�����j���*D�n.g�.���������_X^`NN��F��8�	X`���j�
�_�����Hy��9n�|;KA5T�D�������1�5��4A���YZwq��@	t@	&���Q���Q�*����������0��vy�,�=�QL�@�
D�P�B�
>�Q�D�1\���#��=~�0�B �WA�B���I
�T��I!i�J
G�}�E�T���(4>}���T���hD���V�]�������6���h�5Q�4RT���Y��e6,w�����	�'�@Z
������DJ�^�Q�:��y,`�����<�D��,�^���1S({_p���P^�hE��X���	o�-���
'����*���V���P$K�D��VT������|v�iHg_>������F:F
�l�����><0�52����)MA�0B	�"
�:b�0C
��$=*��
�D�
OD�hR��t�)$O��Fo�1�C����1��� �2I%�LR��F����"aI>T���`R@"�b�A�.Pb�.sT��"��3N9q�#	�2��?�K�y�*?�����4��@��R�$��R� S�*I	�4�>�@b�D��!.�q����K�*x�(�@�1H��%>�4���G
* AO!���?$��#�	V1��1u�s����ZL{
�Qk����fU�6\q���X�~Z 2� ]w��7^yq�p���W"H�a�CH ��v��|�p&FL�&j����@��]$� ! �Lb�$�YH2���S�"I>N�H�c���)
��6�B���B�FJ����;����,�����q�E�U��GP�f!�`���q	�3P��F�+�>�0����RR���1��H�$������c��24d>�����g�l��F|p�\���@���q���7���\+�a�GH���$O��~I��"�)!����?J���!����x��G�xH��H)	4�H���z���>{���"30�g#��g*����H0���������x�~�������xl�!�E�P�F`C�4��]�n�B���45����#,xAvyk �����Q&
�� �����/,�����54�	MXC� !C�d�
0|j�� ���k
e�wD&6��r�����
�/�/�Cf8pE.v��_��*�����^!ZXp0�d
X*�8��<�t	�c�r�;�,��CLz1�x��B/�H�^p`/P�A��E��"3t��$���T����d����h�9�@��D2!��
G\AH���(z�E����
U��H9L����] �!)s
f���J"N�"v��-O��@ ��B
dpk��&���N����dg<%"�� ��OF)O�X�2X�g@:��ThE�@v0��q��"�����$PH2���C~dL�@�L`*4�B��M���{�DB�I
0�a�AG�I�}z����&ZhBEO�%�,���2�A4�.�A��%eb�Y��).�E/j�u�g�LO����6��lU�h�88_\T�N���hBO��1�%�V�f��)����Sq��	j�5�St���&��PP���mhCR!L����c(Gr�-���������$;�c�@������i%d`\����I%���l!/`C�JT��alG\A7�����C������bB�p�b��p���Wt�p���U>��
�E���3���t�A�[��>�af7�����O9�~Q����D��3B��M���	J��|�R��&��;�����C�4P���v��$l|�b�B��&����
�a�s�0��� .���R�C��������!y)M��I�w����:\!U�C����x�!���7)���D�#c�Z����x��-�4|�X<�&��q'r��>E�
��&&��������3d^a
��P���I��Wg���p��!�����C�\0iUk�.�/&�i
x�������]C3^H5be�����N(i�MJ=�����HDU[�5�d��na"���q���,�S����r����"�eoE�����h��o�{�����mp��,$~�}����6�r��oY��"������s/���b+$����#(_��OT��P���/!8Tb��O��P�@��S��U���B�`����q�B��oRV��U;A;�;Z���G$�[3����H��L�h��T�����/��H�'���l��u'M9	����� ��`N����S�pXn��W�A5����.^���8�	���d� �/d1d�7����^h�0^�#��������L�������P��r�24�D���O����Gxx�������1�~�Bh�II
���������!k�/sH_"��8����8���7�����M�L��{�������H3�`@~
:�Z3���<��4:T�@�� @���JX>���{��:��?+����*�{
�����2A��.�q��p*���X���� �M��
��X(T�������>D��������3D0��
i54\83������k��Y�b ��u#6�?�����	�
c<�0CC���5�z�������z
��X���������+�M�����|���xC��������,����c1Y�@���`��=���0��j����C;2�2��Q���D�7�-
���������`��(E��E������g��������hG�����=�h0���x�G���p�x��|�G���@*���@����_P� 4�s+A\�+D���^�D�1�w����^����o4N0Ffy�i�J�!}{�{���3J�eE��6�k�+�0/���-NH�EkB��0C�B��`����XE�(�b��;��8�W:��%;9�4U���Z H�,�c���N8���c��"��8�*��m4�������[^��-#
�HD���HE�x���L�I�c�
!I�|H8�����=lH�h(�p]H�T��#C�	W�8,8,\������L���2v��������J��Ix��������L:0�d�M��LxN�PCXK�P�k��h��J8,8����0��G����|
���'���8�J��)��'.���@+���M�J��"�-��Z��"x$��N����J���J�K��������1���,��P�Jt3/�Y��3K��VL������}		����
������`����h?h���BC�� 80	���,����,����C��*8�������=�h6|��U2j
�b���1���c�
OD�8�������R�Q�0���8[z�
��Y8���	Fy���	M�����8�D
�E��I������T��{qP����hS���A
�$�q��V2��������U��|
c��^�A�I�j�>{�`��a��k�MMV~2(l��h�3�8�p��2o�'y`����U��0}�u�A
	Yx����}J��<��V$;�c@����HS�IA�d%��R��HX��SiX��Dv�����\�����Xt�W��J�S�pO���pN����$����G��^���<��������f�������>-�#��W��c@�����d;�.x�
������L
:u���
��� e�b���0p�x8hM����2�W���@Ybl���^�&�}2�V�8;�0*��{
S��I@P��Z9�����R��=J����c�W�]�����L���(Z���u�U�xT�$
����(:�!
��T���:��Z�YK*��M���/0V�e�R;��V��(l���>�[^�8^�h�����sa�����^
<�^�\��6$��uW`��t��"��M��=��]��f��M8
��5����O���{�������������Z�X��u(�|���SW�@m����QL_��Gb��@u���U� ��P�xr�L��8�	�M����\������x�E���K9(\b��>^hX
����b��J��X��c��.��.`��������U]�C�� �����f(p�#-dR�����@a�z%
������-63N��02VB���m:z����M�G*.�5R�$���8�N����3e����!ex_�p��c:B��q����a|f����rb<v��%�_�d_��U���Z�s�3���-i�Uf�;�31Qz	�e������_�'�� .,��J�Q^`<�\�	H�>W��HX�A0hv�����k�4:x��5E�h��v~$�%2���)F�����Yc��j�#R@������r�_��N�/��=�R�B�`e�(�|�f�P��k�bX���-�tSUvP��)���e�#��='.����g�`��.P����+���v���Y@�`�2��g����d�����4���+�8XK�
��#��pb�P=3�i9�yUAy�
]E���V]�������
��Z��:���j�h��6�jZ���f�X�x^'��a�|�
�,;�!*���0������c��A��Q���8���53�k��n��*0����
��6��Y�����5������?�t�,~���ZJ�M���hjb�8�.<���O�5���n�P'3�i�I2$�^Z`a~j�3��F�c%n�(�;@j�N��1R���p�>x��<LJ�����
��c(���6c~5�����F���~@>�$G.qS�&�,
Y�Xk�z����x�^��h���o^8�����_0"���AY�2' ���A.�?�����s��AsRZD��J���(���q;��f'RNf�h�A(J�"�Y�;���v���fg�Xf���2�n'7�%��SW�
���0������OuM��)_�LL4sr�^��*��+�e�8^��Vws�5�P��w������s��mW}�����b�x�p�?�e
A����=���A0Y���[-p
�����;80<g�v
~Z��v��	��R_u��Y�����~?����f� 
�3f9�h���1+g�PE�������J�g_4.��b�)����
���*�������.(���vI�����Ev�*��}�y����X�}�� T�A�l�=N8��^�Z��I�!�������	'N:#�$��9��.X*
������^�����j�
B�{��t�PL��7Y��v�s�'M 6'���`s��FMhIp��D�E9�k�����y����/�c��3pt3Z�I���G�I�_�?C�����t����!w��E�@w��N8P�_[^��@10���v���|1x�'�6o�Wr����w�k��b����M��Y�p/9����ufI}
���C�QY��+����
A����b�����U��
2Tx�`A�R���T�>�*V��1���"G�,i�$��*W�l��%��
ey���
LZ�b�N2Ib�9�KHMF"��)��R�����MTQf��k+��b�����g�Md��}�����r���k��B9o1+vu)����
E��-%�qlj��7���qBv�	�e��@'+����IF6
����H]b�X��Y�v���8�6���M����`:1u��Y��GjK���C�R`��~���b���ws�~{.g*]�������cf��b=|�����o�~��F7$��KI5UW�I]J��P
MuW�R2X��O����^]�� ~
����$}X�m,��QE�y��L�\A�q��!�5vx'���q��J
�
��r!	3���^;.d�O8��%vT���Yji�y[z�[lF��%~�qF&�i���`KIYL?y�aHzu�J�`K�h�	��Q��aX��,	����-E_I��U	k�X�b���Kv8��k�:�z=*D!D$f'�dB�")�V�
Ai��k�'At�%�)�bG]��d�1u�,�Q�������$�Z{�m��T+���H��J�	���-VtrbW�	�QR��Mq�d�RZm��^���Q��h�H���h��&h�	b�p�t`q�akj1*Bt�k�z���{��P{?m�M��
1Cr9
�,��3�,�?�z�\���3�A�������+p�PC�������rI�b��B�,H�gn��G^/�[U���Obg���M;�$�[N�t"I��B�J&,e��}+�T�-���-e6E�.eqI[���;�Q��5�M5���������:�l{�,�Ri&w���.���&]������5��;C����1T�2�Q��rA��C$���G�L�lm��	�o%��^�O�^K\2|��� Jz��<E���Z���3Cw���`�Qeb)G�^BF�@lE�'&X U4�/Rpf@�HP9��>�0A�V#M�� ��H����q�y���$��5�yI�����Tp=��DS4�<��$���j�� hf ��R��z5��H�'x�������FE6�-P�=L����3�D�l$�.z�`P�x��7�g0@����i�>�����A���'8Q���iE�\H
#rI���,t���h�c�!L��G�H�/�$��-������q?I�I����5���T�X�4�y��J�G����\%@(�g����d-"p���@hqM���^��G�X��^�2���MN3*N|QT�H������dX�`J:9�����K/�if��Yc�`Q�Yr�"���l�S�L�
�!>��N�xt!�	^�������z��M@���I��SC��>�&����3�Z��[&�%8�?B':�/�A�P�Hz1�3t��p	����LT�
��	�0U�T,�p�Qp9��dr/�*	R���`Yb�8 C�w����1���;1��z��U^��U�9s�[�E%�g��h&��L��y���x��9�Sl/E�
�|����d/.��+��5����W��6�[���,�@�A<{������[kk�"X�b��mE��Yv@a%�^Q�����,3D�WR4�v�^�+L 0[��A�lt�+-�d����HAR<��
�/]^;����
��GI#���Hv����%���w��@�[����,����|���N��v5�|����IY�x��Fk���RC��A�x�a�7�NV����n�@$<�� �����������\M�i�Y��X��P����y%��BD�@�Z����R��O����^8p�BR"����WBU�65�	����������f�5����zex$�
>.����E�����L.���n������qC�<M���e�+�\���yu1`���y��R+:�!?�)���g+$���d���b�N�����Y���x���/LU�o/�z�7��Hbn.`�Tau]�L�0��0�}|]��l���7U�l���D�N���k�����$��h������.i��B�����P����-�T�
&0A
����z��F��N��M����8zTNm��9�����I�A���\��%�qE���el����n9��Mt�'���mD�qs�
��		�h�l�2�
�!Bq�l�.(�n�n?��P�<n�yMp�G!�7�y�H].p�pJ��%���w�D%�L7���81��m��h�F6f�v�jD��y"e�q��"/�����4YP]��q��3�����"�|A(/J����u����7nF�$=�&���������\_k#�B�@��V����E���r�����8��R,n������T6���6�D1l�$PB��N���@[���h�9��U�A���� Y��8������`K�_[8���
�A���S������u
c@\X����Z@Y��P�`����r�������&��a@�)�����<�/Sx���E�

��N���!lSTYEx������R���T�@@�TL���M���F&��da���hCyO,x�	p@Z��^�^|8�%�����a}D�a"AP!I��@��5��"�L��L��7��!�)8�6��7v#7�2�D#��u���/��9:'@I�X�,D�I��MXX��&l�<J�&�W��E��8T�U(,&D���K��	������\��!�I�M@b���C���`�A��O�$J��T�dC�#[�.��V����@CRE%�1-�&P���V��0��&�d�������1l���LJa��D�K��.�b|�!#�(�@o�Ax�M~d�]%[��f��5�dJ��O�d[�K��,��0
Y��1!�hdLP�Q�L��
U���)��pa�%���#K���dH����%K�bAx
W��hb@� �B�TpY�1��l&��Q�)n��n�&o����e^�������]a�	L�J��G�)A��p�A���EbZ�HP�GHU�`�$hs�!�,Di�Gj�]�G���l�fg�'�g���O�#L@�C�ez����Vt�n������I0OA`M���1����\Gu`q�H/�D%��"#|�'[j������� ���EdDx���Ny(��+��Pf@��.�s�AP����pT��x�G�����g}P	L��6w�R�"����H*#����M��V��Y�P��b��9f��R��n�&�I��Dc�Q/�g����+�
?a��})E�B��n)�"�f���ZV��Xb��N���&���QEB����FH'Z8�\�b[�#F]��X���+\��>j1h���������U���E*���Q�EvbE1$'D<�V����@ZUkH�O�i]��\�h�j��Z�&���jr�{.��k
�*�*�^����[�P��C/P�I�,pB����d�-�w��@��S�����j"j��+���&�����X	fz���k����=�)�������������(\���&���E-�j*��"fO,TB%�kT��M��Z�)D�����BD�*�D��X���HF�]�k���\��O"����8�M����S�@mB�_a��)ri����r��>�&������-D0��,�� �M'�vG���i���Q�j�7q-}z����Y������&s�c�YA��,�l���x���U���^l�8,��iS�n�"������^�.�������
<���A*�T���Q��1l%ls�jA/��$O���FW�rGYu��l8r���+�

����P��
��h���i���x��:.�(mE�f�N�+Tul��U���c���%d��.���W2�m���YQ�����gAp�(�3�&��o�]Ah���h�����(-���AQ��Fp���A�������P�Bx�p@
��I��G�,E����>�g�+�Y,�mp�Cm���0B�1~H���0�E��7�fM��Th��7=�����g~�D�yD�o[�o�������1�\�/*r]�1��\�R���N:>�\\	W���+�1�,4-��@�NE���u��k���	���1$?#_��D�r[H��	f<YVVV�&O����6�����PrI��+�oL��������h@��������28K0C��2�PP�<E�z����}2;���-���D6�Mk�����"�p�����+A�E:���\�/�����dc��s3��+t�	t���X�-D�p@*��thF�Bg��f��B#�8_�����B4%<%LGmC�{D�Q���1��A���.u�jy���=��f�7�F/��Ps]�Oq��
��	TNcO0�R�pL��L�'�o%�g��5I�����E�Tu�
E�h\*�OA���2U��y%��4n�
\,F�!�1h��t|�5�5�x������f�e����R01�v79���\&��05��R�s���G�F(�R��f���P�f�&g/��^����~���\55�"gE��&�n�(�����\5DPwt-��p�T�y�u
��s��1��8v�&7ku57�<7���q���e@Z������w[Lj_��&���P���n��c�l�.W�,�v~�iw�4~{z�"�s��������J��H���(8[�Z������B����7
��Y��<��>�F�okt�6P������[/�h�x�4��8K�o�Sy!6�E%�%��X���O�r�{��E�jK���`�(M���&�1LBPA���:y�8qA9T�%�7�M9��N<7����!FYh[���,�A�@ew�m��cs+C�'h�	��(�\Ei�z�������*��%���&�Cz�<����W�ox?�h��`�DJ�FLdO%xxA���C�I%�V��p�����K[��M0l8�z�
����{�}4��fs��X|
%k������n�m�p@:A�@z����,����L�A	3��zL�\K����{� �I�k�%�����'�������s��1\*��bT�����[�-�"�	�7�d;��.Dl@����[F�������_<Jf�ln|�����[��<|;��nL�w��CE/T����H4D��:c4Dx�8y/�A�7M�lo�~o�������<��z
=��0���Z�� �_:X�tU�&�I�|��J������[���l�u���W�.������c���d����X���>���<kc�)�B*�9#�
����gG}^�?�������"�x�@���(<D�4x�,��~o{P`^��v������� ��lE����aOu�\j��1*��w����#�9�0?���BHt���s;�$7@����`��&�%T��aC�!F�8�bE�1.�`�#AWA�9�dI�
���u�eK�/a��9�f��;r�e�g��
�#QW���|&U���0�
����(M�j���]di���D����$���J9
jJ+6��1
�����g�����CZ)��C����z	�re�S�Yy�sg���AK��8l����:��#�
��;�
{�^�>�%��	]8�~X�jV�{�����k��}�����,�����]��!.��$/M���x�O��u[W?x�6��o2��������*M��t��t	��!I��"�I	�O=�r����\�������%����C�(a�;���� � |+0*H��
�8�@������X�8N��H�I�l��&��RJ�\`����)����(���r�0[�DF�4���8b0��3L����N ��a��1�2�:!���j�K=�1�8���"�R$���N�x�b4
������T�Q��.0R[�!�(B
�P=p�X3�3��.��L2pN!������|I�����G���Ss�b�^�%(Y�$
	���1^���,b��aE
��u�M0��j#�s"��@��f��!M��2h;�1�]�W�b]��1g��EO�j�S];������K�x��$Y.���NY!Y(%�
��TW��ii��`��!]�W��� H�?�������2j���4+�%�&�]���3"���3���P�\�#: &U/L��*�����Od���K.�����jUf����,��Z!C�u0p����X������H(I���������� ��L
2����
�:d�H�c�
�VU���#]�l�]��;����������
��S|�,���`��p��b�����[���I�$���� H��h�\o��d�#B����,{a���/���
�S����<����-0(@
�2=���1�����(�]
�$�K�!�A�Ev1X%,��N~�C!���eI�/���T�N7�����R���nv���erpQQ1������������|pf$y�����e���_SV���mJ5D��C+-�q���B�,�< "�r����Z�c�3����]�H�AL��p1e���x/�>gd�6a�2B���4�k0h�SJI;p����T���P@n`���@z� �2���2{�_�b�J��$}BI�9�1ZD�,������=���1��@�w�4�ih �s~h?\`p �>(�x
��EN�C��*����;��/�(Ys�y�ip`4��7YX	]�L�rbp�H�p��jbG��;�t����f�J* q �C�XZ�YR��>�
�&�x��E���V���.��3<�g��C���j�{���+zO��N�0 ��9R��U+�$�S{G�A������V���zJk0����z��x@�� ����:%�[�P/Pb���%��X0#�p�����k[v��a;�Q�vU�ISX�3Z�������|��-��x�X5����	i\X��	�ZPx A��.�g�z��J`�Ts
��U>��`d����}�m�h]-h�z^(m�4��K�r���-i�y�������f�$z	�!� @P`�S-i@#����-y�~dAg�/������r�X%1'��H�#��K�=H|�{�����)e��q,�9�n�\`mh��~�A��Crp��F$���b��).!>P`����fE�x_G(K+��#��r�I��Sfn���u�t�1���s������q�=���Wg����p�R�>�@�~��;�A�AE�_:���WC|l�A���9�r�x1U�J9���4Q�e�!�Lfg@�x����}�cA$��+�=����E��
�
;m���-wf�!Y>2C~(���Z���6���X<�*��/�E��<T!���&�J�T4�
���k�����Bk��(nB������A����
�{>d���H*����l�x���B�
����9iN���ML��U:o�N$���u`��k��W�C�o���ZS�E�)�W�qn���JP����ND�A��B��,Y�[>��@_%�%�>�$��e������~���]�H�;Y�J�5��Qg6C@��H����
@>-�
�8�-_j��Cf]��Ib�kq�u���9�	���5�{���w������&��|�H$/@D��O�A�6C||����+B�kg_�����$�����l�����z_�}��^��T_N�-5E��_H��]!�G����H�������a~I~!�@4C0p��!�� 08��"�105p9�=�A0���~�@�� =�-D0epi�m�q0up0��0�p����03,	��	��	�0
k����f��>��$n�!��>d� ��(� ����O�H�3�
��
�0�p����
���B����1q	�
�� �8�R	�	��� ��-�115q9�=�A1EqI�M�Q1UqY�]�a1-Q�l'� �tO!��kE|��"��/h�O!,�2��PM�h�#~����1%�..`�J*��1�q�����1�q����3��l�v2�fj����l�>DUJ"�m�"b������f�p$�q�@FD�r"�Qv�1%�@9"2�"A2$E�
?r$M�$Q2%Ur%Y�%c��.�w�mp�t�t���pE�l���q&���.!��'�/!O���C8����"z����Z�.��$��)�ch'N���F����@(!B�����2��J|��ht������!�#�B.!,
���8�-m��<�Fb+�4�� \��>#����R?B''��2/#��/zv`�XJ���CH!h�N!m�&����m!�l�"��,�0=3p��#�L+�/7�#�8b{��2�3�2�>�� ��8�bv,z�q��~1JZ�!���*Bi�!�
!3)q�9k�L���X�1��1
.��5��=Q(9�����
��&@�#��fh�n`yt@�$ B��� ����A	�AIB)�3V��#>�1�BMC������C��>�c9
bCI�'���B��KB��*�`�5B�qB�nE��1��7�H_c7ID��Ho�D��?
����'��
�T"k�*��� tT!H�/!�+�6
���tX��T��C�2�����N��I����JOc���H��r�H�*����`gt�� �^�"��<�t&(A8@L�P���thN��T+#N�3OU���T8J.:�U_�J0�-m���v�B
2L2���Q��QB�:��A)t$,4$zaV�*>mU,j�G��N�u*>Y\�[(ViC8;BW�5$p��lGt`Q��`�����P#�Q�J�NS"2�$��$�� T/]�"��BJK�=�%�� �"\�w��6^� ��bCO�,�.�/�
@���p� �H"�,�r
t��"|0%~t&�� ��c�I
B`�"�gK�Z����%B�1Ob>��].6�%'L�2�"��i��H��B��Yql`1���`�6)�� :UIvm{����$�� 6^t"�VniejaC�� ��o����]<�x��j��X��l/b��pg���b�CS�s}�2�(aT�5i�	�k/j{���tQ+���&�uiW!Z�nU&�� ����f�s-��8hwm�&!��]��yd���%���@F^@z��C�{?�#;�ku�n*X����5/�<_��TT|A�h��/��Z�25�d��~!$|X@��#|Wa�P�����|
y+"bm�a�~)�g�aI"�dAy;"t/X=8���wmg�x'}����%6.4��/��i8)R)'��%�E���Ae��XT�vp/�c��v
�+���w�{����&r��#v�i]V����EX��8L<.�_�s
(��+b^�p��2�8�a"29"�J��,�[�X8F��A#.uWa��F�������wSgb��@u�'�x�O���{EB���,�3���/��U��������Dn�_X����]y$|�Oxx�#b��8Vdar�)C9����Vg�ldQ9�������9B`��3���#^ �
��!��1��R��9�v?����U�q�"��~�"��rk��������"���9 �o$8x��
���`[�>�9����U����,w�WS��)�"v��o�!�s�F���9���B�;B��E��4!@�s��M�3,z���8ca{�{k�[9�<K��y"c�|R�Z�x@F�M��2j���vew�i�z��=�Z�gJ
��;Gv"�r����9j����%���'�p��#�8�:�a�E%Y�'�����$j@�����&�o��#�:7�:�Y".�Z��I�X�VU�-&[j��.�����;#.U���=�p5�f<��kyD��s��/3�i;$�����:p`��%:kw4��?e�{;�9��D�O;�#C�|��:���������ysf��7y�	��M��a����"�:z��4}f���[�����[��/Bp�Yo���AO�|"�w,5�������z������\,<pBd��O�;"L�v��&�;�p`Y�7���m
��%�K|�m��	��u�!N\um���Q�z���<��� �@��UsB�����{<">������<)Z���%�
����,N�'�87�J5�7'�~�#f��9�������&x���� zG��3"���Y9��=�#�����<�k"�!$�\�M5'=VH��0��=��#2,h�����'}��|�c���\��iF�8�!���
�3�8�"d���2=LT}�%��-�{\�9��I�^�9"
��x-1e����}��]�)��u\�m'��G�q��w�2����!=��]"����t�{����������3���_������"���1� ��� PN�2�������z��Q��\����|�[��K��������6��|�=��F��@��E����;��
��e�����g�hl��N1}����;����K~�I��9�w�����cL��n�\�����Cb�������[ut��~������������������U'��\����|�D����A����y�����T����!��%_�3�14@����E"�u|�G������]��G�����a�"P�U?��>�]��g?$L��k��o���� D��{?#~���_�������QH��]�i>��_���[������_"X�c�"@ �����xa|�=����!�_���������*���y����<�0���:|1����31���;z�2���$Kr$���!�,[�|	3���4�J9����<{��	�`�>��=�4�R�q08}
������
���_�����d
V�X6���l���T�6���tK�������|�Z������z��C��?\
����Xq���J��'���9���/�n&�:���w��~
;�@��k��|"q���E�z���������w����;�k����e[S��)��������+�)�P�+
]K~.������n?���������v�	�=%��pU����r�
N5����������"r'��&:G��N�aW�V�6�U��S�8��\�<
��1�@�?����J�V��NvF���ix�
T�U���O����br�#��Ee�xc����n����r����l��U:tV���U�YR���V�8b�	"�V��N:U��^v�� 6��U�
��zb��������&Vr`
��%����d����dGx<�a�,�X�8� 
`�����l�%�:b
*�)�O�f�mG�v�DW�F��;`�.
d*V��������"j������������l�+�&������@`�����Kp���~(e�����~�����g��>00��4�VLq��^����,3I�.�<g+��.8�����������G;G-��T�:��W�����2��&V��GS�m��h������a�p�
��p��/sgt�����7�wh��r>i���/T1@�8C}O��+�G.���9���x��r*��,tTq�������U��Sx�6�,in��mX{���������	q�qy��+�b`�H�G�c��S��E�a������B�TW�
���u��	,r�����h���s��8z/�&�G�w�8�w��F����
�����wTN��!�9�D���e����@C��D�AQBE�shBIpW
L�&0�?�4�W)|k�^���8�
{���jd�������@T�
�F�'P���X!��@��a���XQ��!^p1�B�j����h�V�������%Fcq�M��X�DE|�h�hH��qV]���#��p��e|b"���&��QH��%	��O:g�����6@I]e�6�D8�HK��9�H	)f���@�dU+�3���>w�S2pOV����O-�p�f&���M/Y%���!\���+*��T�Q����3�YN�P3���&�2AMu��V7�	r�S5��'_�����3���'AI������
�@Z��B�/�(cj��D�{@gF�����E�"�KEK����.��B�8pq�4G���mHj���4�sQ)O�b�8����p���h9�
��b�!� P���:��D\$����k�)Y�������j����bna�2�!2���KRH2�.@+��U�:����-)9�b;�������=�����KRG�@a�
@*V�zca�
�,���B=��b+���V[��-a(�����@&�y��SD�5@D<�lO�
�Gb����W2��6$�
.Ptk^�D6�A.{i����v��PA t�@d8�r	�	�:� �@��,��`!�E�xAR�������z+��b�%���\�v\A��xH*�2���*���t�*	���8�c�*��nX/�1K.�c\13�k����h;��� ���]�� 8=�2q��$�5��s�"���b�Mo��
�9$����<�`���2
i@�r�-�a0B�5(�����?�5O������]�!CZ)�O�N��x]��V�����{B�f�d�t��54�c;i��9�|{4����[O�nn�����"�!X)�AV��"�1N�A�������5��� c�[��vO�t��~��4.F��]0� �����"b� �!�N���m�Y�{!���Dr�o��01���� � �����������u	r�)#����w�{�����"w�K^#��N�K��������nV[��V� �@��q�z�4	9��A��'��J?����E����9�B+��t�*�m�Rq�n|�����~���$�j/���%*�p.WH�^������Up~\Z$���?��?r
C(�<������K~�����/���k~�����?���~��/��O�z�'�o��_���~�����7/���~��������^��/������+��i��"���������2�'��	!7C��Oe��/���������o�����������������������(�H�h���	����
���(�H��z�)�q��u
j
�i�XgqA��e�}d��g�/�1(�3H�5h�7���9��;��=��?�A(�CH��G$����}!#�\A!W����wY�p,�����
_�
��`h�g��i��k��m��o�q(�sH�uh�w��y��{��}x��d~(��H��h���������o1���(��H��h��8�p���������{hRRpp�pX�7�!,tvu7jR�M�����hmWFGtl��~tF�hU %s8��_!,q�7�iqj�V���f�i���6t��hp��&�����vD�k��;p������W�i�������Wv5��#��(��8�(4��sw x�vX��wX�p�]�8=��q\w��*���!��%��m�(���M�sb�h��Y��|vG��Nx������Hu�����"�j7�%'��;U)v
|f]�FS)0�4����b���7�w������7�gW�%�
�2P�~���j9��W�h�_�u;�t��1�N�;p���h���h��q���J��%wv0p,�B�s
�X��{�Gcg��\��|X�&�����9�Q��v�G��"']p0�r n��7�Z��f!��T<@��alv��Wa\Io�)jgv�)��o�It�p���'`��#=�>Px�	>�= va���e����o��r��o� T�q�)r�����mZr�o%9@J��m�����"���f�E����q�������Vi���W�j7��V�w���a�wt-zv/�k1�o3�k��"y� �����������t]���Y�q���������D�DT��J7��V��v���4Q��pv]zt_:ia�mc��zt���D����������kB����D����������PjW�(w����������u
�w���;��	�N�1��V��v�O���j�i'R���X@>�1��V��v�C������(R�PG���ek�jC�f���D��%���C��AF�Z����
i��r�*r��9�h�k��h�Zr��q��9Y�W@�Hd��������vj������+����w��D���
������+�K�k����������!+�#K�%k�'��)��+��-��/�1+�3K�5k�7��9��;��=��?�A+�CK�Ek�G��I��K��M��O�Q+�SK�Uk�W��Y��[��]��_�a+�cK�ek�g��i��k��m��o�q+�sK�uk�w��y��{��}����+���^�����;���P@
����
���;k���5m���`mA8�����[�ef�Y[��������9�p�k�����+�
p��t���{���;��:���������q�;���"���p�q
��.�bq��0���z�.P��tUW�g=��0�h���V����U��k���X��p
����4�k��A���y��g{���]������@�����w����{�hvw��U�����+�����Z�D���Q�1)Z��KW��.P�eF5��X�����w����k����_�.a��:�����e����%C�`y	9)��P�U�t�sh�w@X���M�j8G
�<K�Y������c?��B
Z)�/i�����<�X���X�g��� �)|�
�����y��u��������[�U%|��Q`��]���U%F�����w���gp4���4|g�V`���Wl1�;Y���!,=����0��!�!,RL?�>F���F����������F
)l�����Iu4^��]l��-ylX��qch�'X��x�L��;6sM|j���[,7���nW!��&}),���X���;����!�����)�[%��m�Xk���pT�g^"��P��U������0]�!��g�d���+-�Q`p
(X����kV`�s
�7�!Z��_��sJ�f� �,C���k|c8L���^��Xe�d�h�Q`HPMXbn��q\K-}�p@cx�f|���`Z��gW����bww*M�m}������
g�8�A��%}�����aZ��mZI�qZ�s�[^��������u�Y��Q%y�nY7����f���/��<]u�]����v	����x��sv�c��W8�
qcQyY�n��Y�����1��|j��n�
Q�]��Rg�mA
�������^W8��*���WV|�����8 �A�l���Y=�;�C�9������f���������}x������a7���
$n\^��u�Q����Ms
1�>g��	����cy���$����+(��a~������������W��Dvc�M(>�`Z�n	W��I
im����������QQ�� n�Q�
����
!�>�s�]V� �Z�'�l��aU�����A���eX}bV`�l��?L�������������Wa����d������*��.�>_a^Wn^�P��u�u�z >]�hn^���K��:��=�]����"Od���a�!,�^��=�z;���f��)���j���P6��!�A���5��=/�e�������0 1�@�_�aZ���O<����\ynvsefZ�=K.�������,�]�f�
��t����5y�
5�7.�������N�NeM��X�����
��zp����]h6����70�!�E����e/fwo|&��,X���W��5������������
u�	q�7�����}��������q@�����|�	��4�^oj��Z�b$����f8���3����A��V��������Y�������/W��=@(V���s Tx�a1�z<�X��E�5n����G�!E�$Y�d1SY�!P@�C"2�5RPD'u������O�A�>�!��E�L)����\,
��P�Y�n��5��.��dU�L68��[�q���q�G�\, ��A�-���a��$ gE==i\ �����
��"��z�&]���AY������Q�i��u�>�����.�<Qs�kV| �o���f �����Ar~n�t���{���-���/4x@ ������__���+�P��s�N��)���">"2�>t�!]��9������:��;��P��X�"��3	��2D5��K�Cc�-��H>���h��LjM �r��:�����,&)�,JM�A��r���|�(�6��%�""��#Yh**�4s(���(��:��8�R1D�����3���$���_2��A���' ���x������tR���K�h2�~�K�~!e�?-��(UuU����R2)�#"���*�,�V{��MLbs��Z���p(9�h��I|�v���<(�9�!"��X�*
1�i��3!
3��U#f�h[��,�=f{�Oo��W��*��S��@\/:O`��]j_��tU�~����
��lx������F9�L������_��� ]�5�c�=�U����x� 7#l������v�B:������!z��U x}X���z<�Iv���9B�k�#��"P�j��{X#�Mt��6Z��_��4m�K��h�(�������t�� &	8� O	����ti���~1ya�NAW�Z7:�O�B�r��,��<i�e��
T�����<�����h��t�G������#����x0w�ug{w���3�1Nk�!
h�7����K��M�)�A��M%;��1k:��m����������_0�O����q@&���t4Fq���V�#��d��H�7�����+���V����p]�*�����F��Cv�gR�u�c�F>��b��s��a��
P�2@�RC�D�^�Q��P���wLj�����Tp!��!���=(�\�� >����`E-�q$��zx�����N���$R��;����|�@R�-	����>��$Rt)� m�@$u�Tu� ��dx���|���Gt��5>�1��H��H���1@�4@��IO���������Jjd�.U�=����\$��y�_Lr�N�H�HnU{�l�5��N������C
Yx��,Z�9u�S��x�>�h�P�S���!<��A���[dD�K,R�F���!��H�IT�qf>WV�*����?�5������R����9;���d�d����8��0��."�HT��!�,FK��H~�a��I�ezB�5�((�:z%2�6uH*��8
��CXg�N���"LBZFt���6Q�khI��>��:p]�G��8�n�;�����C06<@,F!�\5����t��+��Ap�=�q��$�iw�@�����Et�����p�e�'v���V�+���Oy{��xb��%aF.�����nu�{]�fW���nw��]��W��%oy�{^���W��eo{��^��W���o}�{_��W���o��_X�&p�
|`'X�fp��`GX��p�-|agX��p�=�a�X�#&q�M|b�X�+fq�]�b�X�3�q�m|c�X�;�q�}�c Y�C&r��|d$'Y�Kfr���d(GY�S�r��|e,gY�[�r���e0�Y�c&s��|f4�Y�kfs���f8�Y�s�s��|g<�Y�{�s���g@Z��&t�
}hD'Z��ft��hHGZ���t�-}iLgZ���t�=�iP�Z���&u�M}jT�Z��fu�]�jX�Z���u�m}k\�Z���u�}�k`[��&v��}ld'[��fv���lhG[���v��}mlg[���v���mp�[��&w��}nt�[��fw���nx�[���w��}o|�[���w���o�\�'x�
~p�'\�gx��p�G\��x�-~q�g\��x�=�q��\�#'y�M~r��\�+gy�]�r��\�3�y�m~s��\�;�y�}�s�]�C'z��~���z�j���A���M�)|��{O���zs=�="�p9&a�~����c���`�S�>�@�Sx6
�������w���p��Ow3^<fE�~t KT��h�#�-?c��q4�8@W�/	";�l.�p�L� ������T�T�l�e�j�u��E��Q��/N_������Oq��e�.��|��[�I������w��>7<���
R-�d��$��sC��@����?��X�������=������������;��P�,����s��0?�� ��*]9(�8���(��?��v���?����S�������
f��\�	ZH�T������`�����A"������0A)���� ��
�@�(L�	���� �������'�����*B3��-1
%�><S�=�����������B��B�H��������!<CBt0O�`�@D�@�����c�C� �S��h�;�M<�A���D�R`����P�[�hB�@D�H`�0������A����E8�Z����F|�S2
V<#D�-����D�	OF�k�x,9<�\�E�����p�]<�7�����bX@���|YEz��8<M��_��]�E�F��+Dl@D-�=���E�Dp�G���hFDD!F�*\l�td��E���|�����@��(�gT��*D�|�vY�h1�AH){�����A��@���OQ$�p>��<vZ�4�����K��K.�J������`�����HC�@�U���x|������B�0F�x�DZ��	�Z�4��`�D��2����,��@vB�������R�H<�z��*J���0����*������I�$��T��,�E"A�)@�A��
���������D
D�t��[$��>x�0N���Jt�J�����J���x(��AJ$ D��|j��������=����� ��IE�CpKn)=�|���E2L�XH�H��rK�yL���b�'�D�@���*�7���9�zM�	0]8�������h��N�0����1��
�*��������CH8��0��((��N9�S���T���S�C�Qh��8��{�:�����:��Q��b���b�J��J����4��$~��Ir�c���#u�*�R(�e�t�LO���]0���M�����0����P%U����t����Z������"�
::��a��9��O���:q��0��R��R���	������)���������J����x�5!<
������1� ��I�G(B�1j���-�����/�
oJ�<��"���%�h��Q�{��K�;��<��n���#���bL�����O� �hM��rC>��&R�*A���XJm=�Z����b���<��89&��s��(����v1?O�n!+����H�#�;��ZA@,.�FoE��hB�q����H��R&�)X����u:�
���F���t��I.�:�U=[��@1�����q��5?$���0�)%�b�u[�Y�-\���
���
�~�������@���Rc=�yI�������������:��?l��
�{
������8@M�?�0�z�P�
	�Z��=z��p��,y�!+-�m�U��t�)e\
�K�u��[���F���W������5\��/�
]�XMn-�����Q�
���������51���(�����mU������%�0�9@O�
�0������2���!������Mi,��P�)���-�����M����1�Z�}_���m�����P>�  �_����a�*���Z��x���N<��=��%���fE�bV�����:]�u]�IF�����=@E�3>c4�C��XHb�8��-K��]k	��
&���f_dAv���@�E����H���a��yE��w�W4�d k��b����aQn/�����ME�!]��%��r&��	��
���F*-��������]�\������L��d�?��u����K�%������>f.[�X�0��`�[v�S�6��b�v[O�(ue{V/�I�Re��+n�'WNa���O�]q�ZDEfN�-����^�������O�����u�h����<��
f������fun~?�����=�����d���fcD�eON�{>������]�8���P]]�A�xe�4��h���'+Jh���9�i�����=��~C.�V�Yxa�&f/$�
6P �F�����Z��f��f���Yi���f���}\��B�������it��PVb6���&x6��V����	Fj�&/���������
�[������u�>������|���v�_��'vkta\h	���\��	O���a�X�cl���0��Al�`o����i��~-�������0�X�
�k_Nk� �����:m��Y;Yk����>�P�����^�H�bGU�[���a�m��V�+�������N��F=OP�_���T�V����������I>��F^�����c�i�A� �������Io��d��o$�.��!(��o��`��J�%k����C��m��m��'$������u���i� n�.�D��}\���u����Fq;��<��>�������������a���X�����C�r�-����	�����IrN��%7]IH]�% �����Q�-�*��5���5�W*n)����1v��U-�Q�[��I_@zq��b��\J�t�R=����ci��!D����`@���X����8�`����T���*w�	�N�w��t���
$����i���j��Y�%��_����u��u�<B�{u�U"�����K��_����G0E���������V���y�n(�����A����)�k�M��,r�i�.t1��D�1j)��J8�w�(z/TR�A�����eI�M�,T�����(J&�Y"����Y*a����r� ������(=2��1)��}7����pE�F���:<�v�����<)��������j��/�?����f�fqm��/�I��i�8�M�-�J��Y&�?���d�>"������j��t��O~��zV�+n�-c.���<����b���R����k�m����W��"�4M��O���(��E:(O����v�fv:������7p�������HH�`�	q�0,�#!�/b����
�)
���@�;0�����	��x��K�]����H�n*t(��F�"M�t)��N�B�*u*��V�b��u+��^��
+v,��f��M�uGB
�H��GR;D���C
\�x1B�����7�/$?���b��8���K9~���s� ���Q��b��F�������/z��c�5�HB���U�����6���b�-�"B�w�6��x��_����T��&9&T������6o�������/��Qx�%��Z"���2���B��RX��b�����C8�]Q���:���.�� H~D	���<47�'�\U���GLy���>���l"�}8EK�8���y���� ���	E��/,�_X�<��EN��&���E�#�hj���MD��{�������Z���"����zd�z@��R����b����r���������ZjY�:_��
�������Z��������.�*���*���@'���"����2����B���@���yJa�a���������[��M��N{.��`����[��������R�������Y��$��#���3�0�7@A��F �t:�Up������#�\��'����+����/���3�\��7����;����?��C]��G#���K3���OC��SS]��Wc���[s���_���c�]��g����k����o���s�]��w����{�������^���#�����3����C���S^���c����s���������^������������������^����������������_���#����3����C���S_���c����s���������_������������������_�������������0� h�"0�
\ ��B0�� +h�b0�� ;���0�"!	Kh��0�*\![���0�2�!
kh��0�:�!{��1�B"�h�#"1�J\"���'$B1�R�"�h�+b1�Z�"���/�1�b#�h�3�. ;

0001-Allow-index-AMs-to-build-and-use-custom-sta-20230206.patchtext/x-patch; charset=UTF-8; name=0001-Allow-index-AMs-to-build-and-use-custom-sta-20230206.patchDownload

From 55c7d906da8f500a7290fba8634ee48c79afefd7 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 17 Oct 2022 18:39:28 +0200
Subject: [PATCH 01/10] Allow index AMs to build and use custom statistics

Some indexing AMs work very differently and estimating them using
existing statistics is problematic, producing unreliable costing. This
applies e.g. to BRIN, which relies on page ranges, not tuple pointers.

This adds an optional AM procedure, allowing the opfamily to build
custom statistics, store them in pg_statistic and then use them during
planning. By default this is disabled, but may be enabled by setting

   SET enable_indexam_stats = true;

Then ANALYZE will call the optional procedure for all indexes.
---
 src/backend/access/brin/brin.c                |   1 +
 src/backend/access/brin/brin_minmax.c         | 823 ++++++++++++++++++
 src/backend/commands/analyze.c                | 135 ++-
 src/backend/statistics/extended_stats.c       |   2 +
 src/backend/utils/adt/selfuncs.c              |  59 ++
 src/backend/utils/cache/lsyscache.c           |  41 +
 src/backend/utils/misc/guc_tables.c           |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h                    |   2 +
 src/include/access/brin.h                     |  63 ++
 src/include/access/brin_internal.h            |   1 +
 src/include/catalog/pg_amproc.dat             |  64 ++
 src/include/catalog/pg_proc.dat               |   4 +
 src/include/catalog/pg_statistic.h            |   5 +
 src/include/commands/vacuum.h                 |   2 +
 src/include/utils/lsyscache.h                 |   1 +
 src/test/regress/expected/sysviews.out        |   3 +-
 17 files changed, 1211 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index de1427a1e0..e8bf20a6ba 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -95,6 +95,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->amstrategies = 0;
 	amroutine->amsupport = BRIN_LAST_OPTIONAL_PROCNUM;
 	amroutine->amoptsprocnum = BRIN_PROCNUM_OPTIONS;
+	amroutine->amstatsprocnum = BRIN_PROCNUM_STATISTICS;
 	amroutine->amcanorder = false;
 	amroutine->amcanorderbyop = false;
 	amroutine->amcanbackward = false;
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 2431591be6..67e68a8b8e 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -10,17 +10,22 @@
  */
 #include "postgres.h"
 
+#include "access/brin.h"
 #include "access/brin_internal.h"
+#include "access/brin_revmap.h"
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 typedef struct MinmaxOpaque
 {
@@ -253,6 +258,824 @@ brin_minmax_union(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/* FIXME copy of a private struct from brin.c */
+typedef struct BrinOpaque
+{
+	BlockNumber bo_pagesPerRange;
+	BrinRevmap *bo_rmAccess;
+	BrinDesc   *bo_bdesc;
+} BrinOpaque;
+
+/*
+ * Compare ranges by minval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_minval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->min_value, rb->min_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/*
+ * Compare ranges by maxval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_maxval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->max_value, rb->max_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/* compare values using an operator from typcache */
+static int
+range_values_cmp(const void *a, const void *b, void *arg)
+{
+	Datum	da = * (Datum *) a;
+	Datum	db = * (Datum *) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  da, db);
+	return DatumGetInt32(c);
+}
+
+/*
+ * minval_end
+ *		Determine first index so that (minval > value).
+ *
+ * The array of ranges is expected to be sorted by minvalue, so this is the first
+ * range that can't possibly intersect with a range having "value" as maxval.
+ */
+static int
+minval_end(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[end]->min_value, typcache) >= 0)
+		return nranges;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[start]->min_value, typcache) < 0)
+		return 0;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->min_value, typcache);
+
+		if (r >= 0)
+			start = midpoint + 1;
+		else
+			end = midpoint;
+	}
+
+	Assert(ranges[start]->min_value > value);
+	Assert(ranges[start-1]->min_value <= value);
+
+	return start;
+}
+
+
+/*
+ * lower_bound
+ *		Determine first index so that (values[index] >= value).
+ *
+ * The array of ranges is expected to be sorted by maxvalue, so this is the first
+ * range that can possibly intersect with range having "value" as minval.
+ */
+static int
+lower_bound(Datum *values, int nvalues, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &values[start], typcache) <= 0)
+		return 0;
+
+	// no matches
+	if (range_values_cmp(&value, &values[end], typcache) > 0)
+		return nvalues;
+
+	while ((end - start) > 0)
+	{
+		int	midpoint;
+		int	r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &values[midpoint], typcache);
+
+		if (r <= 0)
+			end = midpoint;
+		else
+			start = (midpoint + 1);
+	}
+
+	Assert(values[start] >= value);
+	Assert(values[start-1] < value);
+
+	return start;
+}
+
+/*
+ * upper_bound
+ *		Determine first index so that (values[index] > value).
+ *
+ * The array of ranges is expected to be sorted by minvalue, so this is the first
+ * range that can't possibly intersect with a range having "value" as maxval.
+ */
+static int
+upper_bound(Datum *values, int nvalues, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &values[end], typcache) >= 0)
+		return nvalues;
+
+	// no matches
+	if (range_values_cmp(&value, &values[start], typcache) < 0)
+		return 0;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &values[midpoint], typcache);
+
+		if (r >= 0)
+			start = midpoint + 1;
+		else
+			end = midpoint;
+	}
+
+	Assert(values[start] > value);
+	Assert(values[start-1] <= value);
+
+	return start;
+}
+
+/*
+ * brin_minmax_count_overlaps
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges to quickly eliminate ranges that can't possibly
+ * intersect. We simply walk minranges until minval > current maxval, and
+ * we're done.
+ *
+ * Unlike brin_minmax_count_overlaps2, this does not have issues with wide
+ * ranges, so this is what we should use.
+ */
+static void
+brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
+						   TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	int64	noverlaps;
+
+	noverlaps = 0;
+	for (int i = 0; i < nranges; i++)
+	{
+		Datum	maxval = minranges[i]->max_value;
+
+		/*
+		 * Determine index of the first range with (minval > current maxval)
+		 * by binary search. We know all other ranges can't overlap the
+		 * current one. We simply subtract indexes to count ranges.
+		 */
+		int		idx = minval_end(minranges, nranges, maxval, typcache);
+
+		/* -1 because we don't count the range as intersecting with itself */
+		noverlaps += (idx - i - 1);
+	}
+
+	/*
+	 * We only count 1/2 the ranges (minval > current minval), so the total
+	 * number of overlaps is twice what we counted.
+	 */
+	noverlaps *= 2;
+
+	stats->avg_overlaps = (double) noverlaps / nranges;
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Alternative to brin_minmax_match_tuples_to_ranges2, leveraging ordering
+ * of values, not ranges.
+ *
+ * XXX This seems like the optimal way to do this.
+ */
+static void
+brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
+								   int numrows, HeapTuple *rows,
+								   int nvalues, Datum *values,
+								   TypeCacheEntry *typcache,
+								   BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nvalues_unique = 0;
+	int64	nmatches_value = 0;
+
+	int64  *unique = (int64 *) palloc0(sizeof(int64) * nvalues);
+
+	/*
+	 * Build running count of unique values. We know there are unique[i]
+	 * unique values in values array up to index "i".
+	 */
+	unique[0] = 1;
+	for (int i = 1; i < nvalues; i++)
+	{
+		if (range_values_cmp(&values[i-1], &values[i], typcache) == 0)
+			unique[i] = unique[i-1];
+		else
+			unique[i] = unique[i-1] + 1;
+	}
+
+	nvalues_unique = unique[nvalues-1];
+
+	/*
+	 * Walk the ranges, for each range determine the first/last mapping
+	 * value. Use the "unique" array to count the unique values.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int		start;
+		int		end;
+
+		CHECK_FOR_INTERRUPTS();
+
+		start = lower_bound(values, nvalues, ranges->ranges[i].min_value, typcache);
+		end = upper_bound(values, nvalues, ranges->ranges[i].max_value, typcache);
+
+		/* if nothing matches (e.g. end=0), skip this range */
+		if (end <= start)
+			continue;
+
+		nmatches_value = (end - start);
+		nmatches_unique += (unique[end-1] - unique[start] + 1);
+
+		Assert((nmatches_value >= 1) && (nmatches_value <= nvalues));
+		Assert((nmatches_unique >= 1) && (nmatches_unique <= unique[nvalues-1]));
+
+		nmatches += nmatches_value;
+	}
+
+	Assert(nmatches >= 0);
+	Assert(nmatches_unique >= 0);
+
+	stats->avg_matches = (double) nmatches / numrows;
+	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
+}
+
+/*
+ * brin_minmax_value_stats
+ *		Calculate statistics about minval/maxval values.
+ *
+ * We calculate the number of distinct values, and also correlation with respect
+ * to blkno_start. We don't calculate the regular correlation coefficient, because
+ * our goal is to estimate how sequential the accesses are. The regular correlation
+ * would produce 0 for cyclical data sets like mod(i,1000000), but it may be quite
+ * sequantial access. Maybe it should be called differently, not correlation?
+ *
+ * XXX Maybe this should calculate minval vs. maxval correlation too?
+ *
+ * XXX I don't know how important the sequentiality is - BRIN generally uses 1MB
+ * page ranges, which is pretty sequential and the one random seek in between is
+ * likely going to be negligible. Maybe for small page ranges it'll matter, though.
+ */
+static void
+brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
+						int nranges, TypeCacheEntry *typcache,
+						BrinMinmaxStats *stats)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1,
+			minval_corr = 0,
+			maxval_corr = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+			minval_ndist++;
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+			maxval_ndist++;
+
+		/* is it immediately sequential? */
+		if (minranges[i-1]->blkno_end + 1 == minranges[i]->blkno_start)
+			minval_corr++;
+
+		/* is it immediately sequential? */
+		if (maxranges[i-1]->blkno_end + 1 == maxranges[i]->blkno_start)
+			maxval_corr++;
+	}
+
+	stats->minval_ndistinct = minval_ndist;
+	stats->maxval_ndistinct = maxval_ndist;
+
+	stats->minval_correlation = (double) minval_corr / nranges;
+	stats->maxval_correlation = (double) maxval_corr / nranges;
+}
+
+/*
+ * brin_minmax_increment_stats
+ *		Calculate the increment size for minval/maxval steps.
+ *
+ * Calculates the minval/maxval increment size, i.e. number of rows that need
+ * to be added to the sort. This serves as an input to calculation of a good
+ * watermark step.
+ */
+static void
+brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
+							int nranges, Datum *values, int nvalues,
+							TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1;
+
+	double	sum_minval = 0,
+			sum_maxval = 0,
+			max_minval = 0,
+			max_maxval = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+		{
+			double	p;
+			int		start = upper_bound(values, nvalues, minranges[i-1]->min_value, typcache);
+			int		end = upper_bound(values, nvalues, minranges[i]->min_value, typcache);
+
+			/*
+			 * Maybe there are no matching rows, but we still need to count
+			 * this as distinct minval (even though the sample increase is 0).
+			 */
+			minval_ndist++;
+
+			Assert(end >= start);
+
+			/* no sample rows match this, so skip */
+			if (end == start)
+				continue;
+
+			p = (double) (end - start) / nvalues;
+
+			max_minval = Max(max_minval, p);
+			sum_minval += p;
+		}
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+		{
+			double	p;
+			int		start = upper_bound(values, nvalues, maxranges[i-1]->max_value, typcache);
+			int		end = upper_bound(values, nvalues, maxranges[i]->max_value, typcache);
+
+			/*
+			 * Maybe there are no matching rows, but we still need to count
+			 * this as distinct maxval (even though the sample increase is 0).
+			 */
+			maxval_ndist++;
+
+			Assert(end >= start);
+
+			/* no sample rows match this, so skip */
+			if (end == start)
+				continue;
+
+			p = (double) (end - start) / nvalues;
+
+			max_maxval = Max(max_maxval, p);
+			sum_maxval += p;
+		}
+	}
+
+	stats->minval_increment_avg = (sum_minval / minval_ndist);
+	stats->minval_increment_max = max_minval;
+
+	stats->maxval_increment_avg = (sum_maxval / maxval_ndist);
+	stats->maxval_increment_max = max_maxval;
+}
+
+/*
+ * brin_minmax_stats
+ *		Calculate custom statistics for a BRIN minmax index.
+ *
+ * At the moment this calculates:
+ *
+ *  - number of summarized/not-summarized and all/has nulls ranges
+ *  - average number of overlaps for a range
+ *  - average number of rows matching a range
+ *  - number of distinct minval/maxval values
+ *
+ * XXX This could also calculate correlation of the range minval, so that
+ * we can estimate how much random I/O will happen during the BrinSort.
+ * And perhaps we should also sort the ranges by (minval,block_start) to
+ * make this as sequential as possible?
+ *
+ * XXX Another interesting statistics might be the number of ranges with
+ * the same minval (or number of distinct minval values), because that's
+ * essentially what we need to estimate how many ranges will be read in
+ * one brinsort step. In fact, knowing the number of distinct minval
+ * values tells us the number of BrinSort loops.
+ *
+ * XXX We might also calculate a histogram of minval/maxval values.
+ *
+ * XXX I wonder if we could track for each range track probabilities:
+ *
+ * - P1 = P(v <= minval)
+ * - P2 = P(x <= Max(maxval)) for Max(maxval) over preceding ranges
+ *
+ * That would allow us to estimate how many ranges we'll have to read to produce
+ * a particular number of rows, because we need the first probability to exceed
+ * the requested number of rows (fraction of the table):
+ *
+ *     (limit rows / reltuples) <= P(v <= minval)
+ *
+ * and then the second probability would say how many rows we'll process (either
+ * sort or spill). And inversely for the DESC ordering.
+ *
+ * The difference between P1 for two ranges is how much we'd have to sort
+ * if we moved the watermark between the ranges (first minval to second one).
+ * The (P2 - P1) for the new watermark range measures the number of rows in
+ * the tuplestore. We'll need to aggregate this, though, we can't keep the
+ * whole data - probably average/median/max for the differences would be nice.
+ * Might be tricky for different watermark step values, though.
+ *
+ * This would also allow estimating how many rows will spill from each range,
+ * because we have an estimate how many rows match a range on average, and
+ * we can compare it to the difference between P1.
+ *
+ * One issue is we don't have actual tuples from the ranges, so we can't
+ * measure exactly how many rows would we add. But we can match the sample
+ * and at least estimate the the probability difference.
+ *
+ * Actually - we do know the tuples *are* in those ranges, because if we
+ * assume the tuple is in some other range, that range would have to have
+ * a minimal/maximal value so that the value is consistent. Which means
+ * the range has to be between those ranges. Of course, this only estimates
+ * the rows we'd going to add to the tuplesort - there might be more rows
+ * we read and spill to tuplestore, but that's something we can estimate
+ * using average tuples per range.
+ */
+Datum
+brin_minmax_stats(PG_FUNCTION_ARGS)
+{
+	Relation		heapRel = (Relation) PG_GETARG_POINTER(0);
+	Relation		indexRel = (Relation) PG_GETARG_POINTER(1);
+	AttrNumber		attnum = PG_GETARG_INT16(2);
+	AttrNumber		heap_attnum = PG_GETARG_INT16(3);
+	HeapTuple	   *rows = (HeapTuple *) PG_GETARG_POINTER(4);
+	int				numrows = PG_GETARG_INT32(5);
+
+	BrinOpaque *opaque;
+	BlockNumber nblocks;
+	BlockNumber	nranges;
+	BlockNumber	heapBlk;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BrinRanges  *ranges;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinMinmaxStats *stats;
+
+	Oid				typoid;
+	TypeCacheEntry *typcache;
+	BrinRange	  **minranges,
+				  **maxranges;
+	int64			prev_min_index;
+
+	/*
+	 * Mostly what brinbeginscan does to initialize BrinOpaque, except that
+	 * we use active snapshot instead of the scan snapshot.
+	 */
+	opaque = palloc_object(BrinOpaque);
+	opaque->bo_rmAccess = brinRevmapInitialize(indexRel,
+											   &opaque->bo_pagesPerRange,
+											   GetActiveSnapshot());
+	opaque->bo_bdesc = brin_build_desc(indexRel);
+
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+
+	/*
+	 * How many ranges can there be? We simply look at the number of pages,
+	 * divide it by the pages_per_range.
+	 *
+	 * XXX We need to be careful not to overflow nranges, so we just divide
+	 * and then maybe add 1 for partial ranges.
+	 */
+	nranges = (nblocks / pagesPerRange);
+	if (nblocks % pagesPerRange != 0)
+		nranges += 1;
+
+	/* allocate for space, and also for the alternative ordering */
+	ranges = palloc0(offsetof(BrinRanges, ranges) + nranges * sizeof(BrinRange));
+	ranges->nranges = 0;
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* result stats */
+	stats = palloc0(sizeof(BrinMinmaxStats));
+	SET_VARSIZE(stats, sizeof(BrinMinmaxStats));
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX We count the ranges, and count the special types (not summarized,
+	 * all-null and has-null). The regular ranges are accumulated into an
+	 * array, so that we can calculate additional statistics (overlaps, hits
+	 * for sample tuples, etc).
+	 *
+	 * XXX This needs rethinking to make it work with large indexes with more
+	 * ranges than we can fit into memory (work_mem/maintenance_work_mem).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		stats->n_ranges++;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   GetActiveSnapshot());
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/* Ranges with no indexed tuple are ignored for overlap analysis. */
+		if (!gottuple)
+		{
+			continue;
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/* Placeholders can be ignored too, as if not summarized. */
+				continue;
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				/* OK this range is summarized */
+				stats->n_summarized++;
+
+				if (bval->bv_allnulls)
+					stats->n_all_nulls++;
+
+				if (bval->bv_hasnulls)
+					stats->n_has_nulls++;
+
+				if (!bval->bv_allnulls)
+				{
+					BrinRange  *range;
+
+					range = &ranges->ranges[ranges->nranges++];
+
+					range->blkno_start = heapBlk;
+					range->blkno_end = heapBlk + (pagesPerRange - 1);
+
+					range->min_value = bval->bv_values[0];
+					range->max_value = bval->bv_values[1];
+				}
+			}
+		}
+	}
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* if we have no regular ranges, we're done */
+	if (ranges->nranges == 0)
+		goto cleanup;
+
+	/*
+	 * Build auxiliary info to optimize the calculation.
+	 *
+	 * We have ranges in the blocknum order, but that is not very useful when
+	 * calculating which ranges interstect - we could cross-check every range
+	 * against every other range, but that's O(N^2) and thus may get extremely
+	 * expensive pretty quick).
+	 *
+	 * To make that cheaper, we'll build two orderings, allowing us to quickly
+	 * eliminate ranges that can't possibly overlap:
+	 *
+	 * - minranges = ranges ordered by min_value
+	 * - maxranges = ranges ordered by max_value
+	 *
+	 * To count intersections, we'll then walk maxranges (i.e. ranges ordered
+	 * by maxval), and for each following range we'll check if it overlaps.
+	 * If yes, we'll proceed to the next one, until we find a range that does
+	 * not overlap. But there might be a later page overlapping - but we can
+	 * use a min_index_lowest tracking the minimum min_index for "future"
+	 * ranges to quickly decide if there are such ranges. If there are none,
+	 * we can terminate (and proceed to the next maxranges element), else we
+	 * have to process additional ranges.
+	 *
+	 * Note: This only counts overlaps with ranges with max_value higher than
+	 * the current one - we want to count all, but the overlaps with preceding
+	 * ranges have already been counted when processing those preceding ranges.
+	 * That is, we'll end up with counting each overlap just for one of those
+	 * ranges, so we get only 1/2 the count.
+	 *
+	 * Note: We don't count the range as overlapping with itself. This needs
+	 * to be considered later, when applying the statistics.
+	 *
+	 *
+	 * XXX This will not work for very many ranges - we can have up to 2^32 of
+	 * them, so allocating a ~32B struct for each would need a lot of memory.
+	 * Not sure what to do about that, perhaps we could sample a couple ranges
+	 * and do some calculations based on that? That is, we could process all
+	 * ranges up to some number (say, statistics_target * 300, as for rows), and
+	 * then sample ranges for larger tables. Then sort the sampled ranges, and
+	 * walk through all ranges once, comparing them to the sample and counting
+	 * overlaps (having them sorted should allow making this quite efficient,
+	 * I think - following algorithm similar to the one implemented here).
+	 */
+
+	/* info about ordering for the data type */
+	typoid = get_atttype(RelationGetRelid(indexRel), attnum);
+	typcache = lookup_type_cache(typoid, TYPECACHE_CMP_PROC_FINFO);
+
+	/* shouldn't happen, I think - we use this to build the index */
+	Assert(OidIsValid(typcache->cmp_proc_finfo.fn_oid));
+
+	minranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+	maxranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+
+	/*
+	 * Build and sort the ranges min_value / max_value (just pointers
+	 * to the main array). Then go and assign the min_index to each
+	 * range, and finally walk the maxranges array backwards and track
+	 * the min_index_lowest as minimum of "future" indexes.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		minranges[i] = &ranges->ranges[i];
+		maxranges[i] = &ranges->ranges[i];
+	}
+
+	qsort_arg(minranges, ranges->nranges, sizeof(BrinRange *),
+			  range_minval_cmp, typcache);
+
+	qsort_arg(maxranges, ranges->nranges, sizeof(BrinRange *),
+			  range_maxval_cmp, typcache);
+
+	/*
+	 * Update the min_index for each range. If the values are equal, be sure to
+	 * pick the lowest index with that min_value.
+	 */
+	minranges[0]->min_index = 0;
+	for (int i = 1; i < ranges->nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i]->min_value, &minranges[i-1]->min_value, typcache) == 0)
+			minranges[i]->min_index = minranges[i-1]->min_index;
+		else
+			minranges[i]->min_index = i;
+	}
+
+	/*
+	 * Walk the maxranges backward and assign the min_index_lowest as
+	 * a running minimum.
+	 */
+	prev_min_index = ranges->nranges;
+	for (int i = (ranges->nranges - 1); i >= 0; i--)
+	{
+		maxranges[i]->min_index_lowest = Min(maxranges[i]->min_index,
+											 prev_min_index);
+		prev_min_index = maxranges[i]->min_index_lowest;
+	}
+
+	/* calculate average number of overlapping ranges for any range */
+	brin_minmax_count_overlaps(minranges, ranges->nranges, typcache, stats);
+
+	/* calculate minval/maxval stats (distinct values and correlation) */
+	brin_minmax_value_stats(minranges, maxranges,
+							ranges->nranges, typcache, stats);
+
+	/* match tuples to ranges */
+	{
+		int		nvalues = 0;
+		Datum  *values = (Datum *) palloc0(numrows * sizeof(Datum));
+
+		TupleDesc	tdesc = RelationGetDescr(heapRel);
+
+		for (int i = 0; i < numrows; i++)
+		{
+			bool	isnull;
+			Datum	value;
+
+			value = heap_getattr(rows[i], heap_attnum, tdesc, &isnull);
+			if (!isnull)
+				values[nvalues++] = value;
+		}
+
+		qsort_arg(values, nvalues, sizeof(Datum), range_values_cmp, typcache);
+
+		/* optimized algorithm */
+		brin_minmax_match_tuples_to_ranges(ranges,
+										   numrows, rows, nvalues, values,
+										   typcache, stats);
+
+		brin_minmax_increment_stats(minranges, maxranges, ranges->nranges,
+									values, nvalues, typcache, stats);
+	}
+
+	/*
+	 * Possibly quite large, so release explicitly and don't rely
+	 * on the memory context to discard this.
+	 */
+	pfree(minranges);
+	pfree(maxranges);
+
+cleanup:
+	/* possibly quite large, so release explicitly */
+	pfree(ranges);
+
+	/* free the BrinOpaque, just like brinendscan() would */
+	brinRevmapTerminate(opaque->bo_rmAccess);
+	brin_free_desc(opaque->bo_bdesc);
+
+	PG_RETURN_POINTER(stats);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c86e690980..07ae4eadb8 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -16,6 +16,7 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
 #include "access/detoast.h"
 #include "access/genam.h"
 #include "access/multixact.h"
@@ -30,6 +31,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_namespace.h"
@@ -81,6 +83,7 @@ typedef struct AnlIndexData
 
 /* Default statistics target (GUC parameter) */
 int			default_statistics_target = 100;
+bool		enable_indexam_stats = false;
 
 /* A few variables that don't seem worth passing around as parameters */
 static MemoryContext anl_context = NULL;
@@ -92,7 +95,7 @@ static void do_analyze_rel(Relation onerel,
 						   AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
 						   bool inh, bool in_outer_xact, int elevel);
 static void compute_index_stats(Relation onerel, double totalrows,
-								AnlIndexData *indexdata, int nindexes,
+								AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 								HeapTuple *rows, int numrows,
 								MemoryContext col_context);
 static VacAttrStats *examine_attribute(Relation onerel, int attnum,
@@ -453,15 +456,49 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		{
 			AnlIndexData *thisdata = &indexdata[ind];
 			IndexInfo  *indexInfo;
+			bool		collectAmStats;
+			Oid			regproc;
 
 			thisdata->indexInfo = indexInfo = BuildIndexInfo(Irel[ind]);
 			thisdata->tupleFract = 1.0; /* fix later if partial */
-			if (indexInfo->ii_Expressions != NIL && va_cols == NIL)
+
+			/*
+			 * Should we collect AM-specific statistics for any of the columns?
+			 *
+			 * If AM-specific statistics are enabled (using a GUC), see if we
+			 * have an optional support procedure to build the statistics.
+			 *
+			 * If there's any such attribute, we just force building stats
+			 * even for regular index keys (not just expressions) and indexes
+			 * without predicates. It'd be good to only build the AM stats, but
+			 * for now this is good enough.
+			 *
+			 * XXX The GUC is there morestly to make it easier to enable/disable
+			 * this during development.
+			 *
+			 * FIXME Only build the AM statistics, not the other stats. And only
+			 * do that for the keys with the optional procedure. not all of them.
+			 */
+			collectAmStats = false;
+			if (enable_indexam_stats && (Irel[ind]->rd_indam->amstatsprocnum != 0))
+			{
+				for (int j = 0; j < indexInfo->ii_NumIndexAttrs; j++)
+				{
+					regproc = index_getprocid(Irel[ind], (j+1), Irel[ind]->rd_indam->amstatsprocnum);
+					if (OidIsValid(regproc))
+					{
+						collectAmStats = true;
+						break;
+					}
+				}
+			}
+
+			if ((indexInfo->ii_Expressions != NIL || collectAmStats) && va_cols == NIL)
 			{
 				ListCell   *indexpr_item = list_head(indexInfo->ii_Expressions);
 
 				thisdata->vacattrstats = (VacAttrStats **)
-					palloc(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
+					palloc0(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
 				tcnt = 0;
 				for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
 				{
@@ -482,6 +519,12 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 						if (thisdata->vacattrstats[tcnt] != NULL)
 							tcnt++;
 					}
+					else
+					{
+						thisdata->vacattrstats[tcnt] =
+							examine_attribute(Irel[ind], i + 1, NULL);
+						tcnt++;
+					}
 				}
 				thisdata->attr_cnt = tcnt;
 			}
@@ -587,7 +630,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 		if (nindexes > 0)
 			compute_index_stats(onerel, totalrows,
-								indexdata, nindexes,
+								indexdata, Irel, nindexes,
 								rows, numrows,
 								col_context);
 
@@ -821,12 +864,79 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	anl_context = NULL;
 }
 
+/*
+ * compute_indexam_stats
+ *		Call the optional procedure to compute AM-specific statistics.
+ *
+ * We simply call the procedure, which is expected to produce a bytea value.
+ *
+ * At the moment this only deals with BRIN indexes, and bails out for other
+ * access methods, but it should be generic - use something like amoptsprocnum
+ * and just check if the procedure exists.
+ */
+static void
+compute_indexam_stats(Relation onerel,
+					  Relation indexRel, IndexInfo *indexInfo,
+					  double totalrows, AnlIndexData *indexdata,
+					  HeapTuple *rows, int numrows)
+{
+	if (!enable_indexam_stats)
+		return;
+
+	/* ignore index AMs without the optional procedure */
+	if (indexRel->rd_indam->amstatsprocnum == 0)
+		return;
+
+	/*
+	 * Look at attributes, and calculate stats for those that have the
+	 * optional stats proc for the opfamily.
+	 */
+	for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+	{
+		AttrNumber		attno = (i + 1);
+		AttrNumber		attnum = indexInfo->ii_IndexAttrNumbers[i];	/* heap attnum */
+		RegProcedure	regproc;
+		FmgrInfo	   *statsproc;
+		Datum			datum;
+		VacAttrStats   *stats;
+		MemoryContext	oldcxt;
+
+		/* do this first, as it doesn't fail when proc not defined */
+		regproc = index_getprocid(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+
+		/* ignore opclasses without the optional procedure */
+		if (!RegProcedureIsValid(regproc))
+			continue;
+
+		statsproc = index_getprocinfo(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+		Assert(statsproc != NULL);
+
+		stats = indexdata->vacattrstats[i];
+
+		oldcxt = MemoryContextSwitchTo(stats->anl_context);
+
+		/* call the proc, let the AM calculate whatever it wants */
+		datum = FunctionCall6Coll(statsproc,
+								  InvalidOid, /* FIXME correct collation */
+								  PointerGetDatum(onerel),
+								  PointerGetDatum(indexRel),
+								  Int16GetDatum(attno),
+								  Int16GetDatum(attnum),
+								  PointerGetDatum(rows),
+								  Int32GetDatum(numrows));
+
+		stats->staindexam = datum;
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+}
+
 /*
  * Compute statistics about indexes of a relation
  */
 static void
 compute_index_stats(Relation onerel, double totalrows,
-					AnlIndexData *indexdata, int nindexes,
+					AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 					HeapTuple *rows, int numrows,
 					MemoryContext col_context)
 {
@@ -846,6 +956,7 @@ compute_index_stats(Relation onerel, double totalrows,
 	{
 		AnlIndexData *thisdata = &indexdata[ind];
 		IndexInfo  *indexInfo = thisdata->indexInfo;
+		Relation	indexRel = indexRels[ind];
 		int			attr_cnt = thisdata->attr_cnt;
 		TupleTableSlot *slot;
 		EState	   *estate;
@@ -858,6 +969,13 @@ compute_index_stats(Relation onerel, double totalrows,
 					rowno;
 		double		totalindexrows;
 
+		/*
+		 * If this is a BRIN index, try calling a procedure to collect
+		 * extra opfamily-specific statistics (if procedure defined).
+		 */
+		compute_indexam_stats(onerel, indexRel, indexInfo, totalrows,
+							  thisdata, rows, numrows);
+
 		/* Ignore index if no columns to analyze and not partial */
 		if (attr_cnt == 0 && indexInfo->ii_Predicate == NIL)
 			continue;
@@ -1661,6 +1779,13 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+
+		/* optional AM-specific stats */
+		if (DatumGetPointer(stats->staindexam) != NULL)
+			values[Anum_pg_statistic_staindexam - 1] = stats->staindexam;
+		else
+			nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index bdc21bb457..448ee6c6fc 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -2370,6 +2370,8 @@ serialize_expr_stats(AnlExprData *exprdata, int nexprs)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+		nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index fe37e65af0..cc2f3ef012 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7834,6 +7834,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	Relation	indexRel;
 	ListCell   *l;
 	VariableStatData vardata;
+	double		averageOverlaps;
 
 	Assert(rte->rtekind == RTE_RELATION);
 
@@ -7881,6 +7882,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 * correlation statistics, we will keep it as 0.
 	 */
 	*indexCorrelation = 0;
+	averageOverlaps = 0.0;
 
 	foreach(l, path->indexclauses)
 	{
@@ -7890,6 +7892,36 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		/* attempt to lookup stats in relation for this index column */
 		if (attnum != 0)
 		{
+			/*
+			 * If AM-specific statistics are enabled, try looking up the stats
+			 * for the index key. We only have this for minmax opclasses, so
+			 * we just cast it like that. But other BRIN opclasses might need
+			 * other stats so either we need to abstract this somehow, or maybe
+			 * just collect a sufficiently generic stats for all BRIN indexes.
+			 *
+			 * XXX Make this non-minmax specific.
+			 */
+			if (enable_indexam_stats)
+			{
+				BrinMinmaxStats  *amstats
+					= (BrinMinmaxStats *) get_attindexam(index->indexoid, attnum);
+
+				if (amstats)
+				{
+					elog(DEBUG1, "found AM stats: attnum %d n_ranges %lld n_summarized %lld n_all_nulls %lld n_has_nulls %lld avg_overlaps %f",
+						 attnum, (long long)amstats->n_ranges, (long long)amstats->n_summarized,
+						 (long long)amstats->n_all_nulls, (long long)amstats->n_has_nulls,
+						 amstats->avg_overlaps);
+
+					/*
+					 * The only thing we use at the moment is the average number
+					 * of overlaps for a single range. Use the other stuff too.
+					 */
+					averageOverlaps = Max(averageOverlaps,
+										  1.0 + amstats->avg_overlaps);
+				}
+			}
+
 			/* Simple variable -- look to stats for the underlying table */
 			if (get_relation_stats_hook &&
 				(*get_relation_stats_hook) (root, rte, attnum, &vardata))
@@ -7970,6 +8002,14 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 											 baserel->relid,
 											 JOIN_INNER, NULL);
 
+	/*
+	 * XXX Can we combine qualSelectivity with the average number of matching
+	 * ranges per value? qualSelectivity estimates how many tuples ar we
+	 * going to match, and average number of matches says how many ranges
+	 * will each of those match on average. We don't know how many will
+	 * be duplicate, but it gives us a worst-case estimate, at least.
+	 */
+
 	/*
 	 * Now calculate the minimum possible ranges we could match with if all of
 	 * the rows were in the perfect order in the table's heap.
@@ -7986,6 +8026,25 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	else
 		estimatedRanges = Min(minimalRanges / *indexCorrelation, indexRanges);
 
+	elog(DEBUG1, "before index AM stats: cestimatedRanges = %f", estimatedRanges);
+
+	/*
+	 * If we found some AM stats, look at average number of overlapping ranges,
+	 * and apply that to the currently estimated ranges.
+	 *
+	 * XXX We pretty much combine this with correlation info (because it was
+	 * already applied in the estimatedRanges formula above), which might be
+	 * overly pessimistic. The overlaps stats seems somewhat redundant with
+	 * the correlation, so maybe we should do just one? The AM stats seems
+	 * like a more reliable information, because the correlation is not very
+	 * sensitive to outliers, for example. So maybe let's prefer that, and
+	 * only use the correlation as fallback when AM stats are not available?
+	 */
+	if (averageOverlaps > 0.0)
+		estimatedRanges = Min(estimatedRanges * averageOverlaps, indexRanges);
+
+	elog(DEBUG1, "after index AM stats: cestimatedRanges = %f", estimatedRanges);
+
 	/* we expect to visit this portion of the table */
 	selec = estimatedRanges / indexRanges;
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c07382051d..e41aabdeae 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -3138,6 +3138,47 @@ get_attavgwidth(Oid relid, AttrNumber attnum)
 	return 0;
 }
 
+
+/*
+ * get_attstaindexam
+ *
+ *	  Given the table and attribute number of a column, get the index AM
+ *	  statistics.  Return NULL if no data available.
+ *
+ * Currently this is only consulted for individual tables, not for inheritance
+ * trees, so we don't need an "inh" parameter.
+ */
+bytea *
+get_attindexam(Oid relid, AttrNumber attnum)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache3(STATRELATTINH,
+						 ObjectIdGetDatum(relid),
+						 Int16GetDatum(attnum),
+						 BoolGetDatum(false));
+	if (HeapTupleIsValid(tp))
+	{
+		Datum	val;
+		bytea  *retval = NULL;
+		bool	isnull;
+
+		val = SysCacheGetAttr(STATRELATTINH, tp,
+							  Anum_pg_statistic_staindexam,
+							  &isnull);
+
+		if (!isnull)
+			retval = (bytea *) PG_DETOAST_DATUM(val);
+
+		// staindexam = ((Form_pg_statistic) GETSTRUCT(tp))->staindexam;
+		ReleaseSysCache(tp);
+
+		return retval;
+	}
+
+	return NULL;
+}
+
 /*
  * get_attstatsslot
  *
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index b46e3b8c55..35d51df30b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1002,6 +1002,16 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexam_stats", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index AM stats."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_indexam_stats,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d06074b86f..47e80ad150 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -375,6 +375,7 @@
 #enable_hashagg = on
 #enable_hashjoin = on
 #enable_incremental_sort = on
+#enable_indexam_stats = off
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 4f1f67b4d0..e3eab725ae 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -216,6 +216,8 @@ typedef struct IndexAmRoutine
 	uint16		amsupport;
 	/* opclass options support function number or 0 */
 	uint16		amoptsprocnum;
+	/* opclass statistics support function number or 0 */
+	uint16		amstatsprocnum;
 	/* does AM support ORDER BY indexed column's value? */
 	bool		amcanorder;
 	/* does AM support ORDER BY result of an operator on indexed column? */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index ed66f1b3d5..1d21b816fc 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,6 +34,69 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinMinmaxStats
+{
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	int64		n_ranges;
+	int64		n_summarized;
+	int64		n_all_nulls;
+	int64		n_has_nulls;
+
+	/* average number of overlapping ranges */
+	double		avg_overlaps;
+
+	/* average number of matching ranges (per value) */
+	double		avg_matches;
+	double		avg_matches_unique;
+
+	/* minval/maxval stats (ndistinct, correlation to blkno) */
+	int64		minval_ndistinct;
+	int64		maxval_ndistinct;
+	double		minval_correlation;
+	double		maxval_correlation;
+
+	/* minval/maxval increment stats */
+	double		minval_increment_avg;
+	double		minval_increment_max;
+	double		maxval_increment_avg;
+	double		maxval_increment_max;
+
+} BrinMinmaxStats;
 
 #define BRIN_DEFAULT_PAGES_PER_RANGE	128
 #define BrinGetPagesPerRange(relation) \
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index 97ddc925b2..eac796e6f4 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -75,6 +75,7 @@ typedef struct BrinDesc
 #define BRIN_PROCNUM_OPTIONS 		5	/* optional */
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
+#define BRIN_PROCNUM_STATISTICS		11	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 5b950129de..9bbd1f14f1 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -804,6 +804,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -835,6 +837,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -864,6 +868,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -893,6 +899,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -905,6 +913,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -917,6 +927,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1034,6 +1046,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1062,6 +1076,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1110,6 +1126,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1160,6 +1178,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1173,6 +1194,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1261,6 +1285,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1314,6 +1341,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1366,6 +1396,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1436,6 +1468,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1467,6 +1502,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1517,6 +1554,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1530,6 +1570,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1542,6 +1585,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1668,6 +1713,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1721,6 +1769,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1771,6 +1822,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1785,6 +1838,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1799,6 +1855,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1851,6 +1910,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1924,6 +1985,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..90abbf383d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8487,6 +8487,10 @@
 { oid => '3386', descr => 'BRIN minmax support',
   proname => 'brin_minmax_union', prorettype => 'bool',
   proargtypes => 'internal internal internal', prosrc => 'brin_minmax_union' },
+{ oid => '9800', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_stats', prorettype => 'bool',
+  proargtypes => 'internal internal int2 int2 internal int4',
+  prosrc => 'brin_minmax_stats' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 8770c5b4c6..d3d0bce257 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -121,6 +121,11 @@ CATALOG(pg_statistic,2619,StatisticRelationId)
 	anyarray	stavalues3;
 	anyarray	stavalues4;
 	anyarray	stavalues5;
+
+	/*
+	 * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).
+	 */
+	bytea		staindexam;
 #endif
 } FormData_pg_statistic;
 
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 689dbb7702..dba411cacf 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -155,6 +155,7 @@ typedef struct VacAttrStats
 	float4	   *stanumbers[STATISTIC_NUM_SLOTS];
 	int			numvalues[STATISTIC_NUM_SLOTS];
 	Datum	   *stavalues[STATISTIC_NUM_SLOTS];
+	Datum		staindexam;		/* index-specific stats (as bytea) */
 
 	/*
 	 * These fields describe the stavalues[n] element types. They will be
@@ -299,6 +300,7 @@ extern PGDLLIMPORT int vacuum_multixact_freeze_min_age;
 extern PGDLLIMPORT int vacuum_multixact_freeze_table_age;
 extern PGDLLIMPORT int vacuum_failsafe_age;
 extern PGDLLIMPORT int vacuum_multixact_failsafe_age;
+extern PGDLLIMPORT bool enable_indexam_stats;
 
 /* Variables for cost-based parallel vacuum */
 extern PGDLLIMPORT pg_atomic_uint32 *VacuumSharedCostBalance;
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b972..fcef91d306 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -185,6 +185,7 @@ extern Oid	getBaseType(Oid typid);
 extern Oid	getBaseTypeAndTypmod(Oid typid, int32 *typmod);
 extern int32 get_typavgwidth(Oid typid, int32 typmod);
 extern int32 get_attavgwidth(Oid relid, AttrNumber attnum);
+extern bytea *get_attindexam(Oid relid, AttrNumber attnum);
 extern bool get_attstatsslot(AttStatsSlot *sslot, HeapTuple statstuple,
 							 int reqkind, Oid reqop, int flags);
 extern void free_attstatsslot(AttStatsSlot *sslot);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 001c6e7eb9..b7fda6fc82 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -117,6 +117,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashagg                 | on
  enable_hashjoin                | on
  enable_incremental_sort        | on
+ enable_indexam_stats           | off
  enable_indexonlyscan           | on
  enable_indexscan               | on
  enable_material                | on
@@ -132,7 +133,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(21 rows)
+(22 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.39.0

0002-wip-introduce-debug_brin_stats-20230206.patchtext/x-patch; charset=UTF-8; name=0002-wip-introduce-debug_brin_stats-20230206.patchDownload

From a903388db13b561bd3b5f5364de53f0ea7a7ff8d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 17:46:51 +0200
Subject: [PATCH 02/10] wip: introduce debug_brin_stats

---
 src/backend/access/brin/brin_minmax.c | 80 +++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c   | 18 ++++++
 2 files changed, 98 insertions(+)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 67e68a8b8e..6e20a34fbc 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -27,6 +27,10 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 
+#ifdef DEBUG_BRIN_STATS
+bool debug_brin_stats = false;
+#endif
+
 typedef struct MinmaxOpaque
 {
 	Oid			cached_subtype;
@@ -479,6 +483,13 @@ brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
 {
 	int64	noverlaps;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	noverlaps = 0;
 	for (int i = 0; i < nranges; i++)
 	{
@@ -501,6 +512,16 @@ brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
 	 */
 	noverlaps *= 2;
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->avg_overlaps = (double) noverlaps / nranges;
 }
 
@@ -527,6 +548,13 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 
 	int64  *unique = (int64 *) palloc0(sizeof(int64) * nvalues);
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	/*
 	 * Build running count of unique values. We know there are unique[i]
 	 * unique values in values array up to index "i".
@@ -572,6 +600,18 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 	Assert(nmatches >= 0);
 	Assert(nmatches_unique >= 0);
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f", nmatches_unique, nvalues_unique,
+			(double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->avg_matches = (double) nmatches / numrows;
 	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
 }
@@ -603,6 +643,13 @@ brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
 			minval_corr = 0,
 			maxval_corr = 0;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	for (int i = 1; i < nranges; i++)
 	{
 		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
@@ -625,6 +672,19 @@ brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
 
 	stats->minval_correlation = (double) minval_corr / nranges;
 	stats->maxval_correlation = (double) maxval_corr / nranges;
+
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_value_stats -----");
+		elog(WARNING, "minval ndistinct " INT64_FORMAT " correlation %f",
+			 stats->minval_ndistinct, stats->minval_correlation);
+		elog(WARNING, "maxval ndistinct " INT64_FORMAT " correlation %f",
+			 stats->maxval_ndistinct, stats->maxval_correlation);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
 }
 
 /*
@@ -649,6 +709,13 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 			max_minval = 0,
 			max_maxval = 0;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	for (int i = 1; i < nranges; i++)
 	{
 		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
@@ -700,6 +767,19 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 		}
 	}
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_increment_stats -----");
+		elog(WARNING, "minval ndistinct %ld sum %f max %f avg %f",
+			 minval_ndist, sum_minval, max_minval, sum_minval / minval_ndist);
+		elog(WARNING, "maxval ndistinct %ld sum %f max %f avg %f",
+			 maxval_ndist, sum_maxval, max_maxval, sum_maxval / maxval_ndist);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->minval_increment_avg = (sum_minval / minval_ndist);
 	stats->minval_increment_max = max_minval;
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 35d51df30b..498e1af518 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -96,6 +96,10 @@ extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
 extern bool synchronize_seqscans;
 
+#ifdef DEBUG_BRIN_STATS
+extern bool debug_brin_stats;
+#endif
+
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
 #endif
@@ -1230,6 +1234,20 @@ struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+#ifdef DEBUG_BRIN_STATS
+	/* this is undocumented because not exposed in a standard build */
+	{
+		{"debug_brin_stats", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Print info about calculated BRIN statistics."),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&debug_brin_stats,
+		false,
+		NULL, NULL, NULL
+	},
+#endif
+
 	{
 		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Terminate session on any error."),
-- 
2.39.0

0003-wip-introduce-debug_brin_cross_check-20230206.patchtext/x-patch; charset=UTF-8; name=0003-wip-introduce-debug_brin_cross_check-20230206.patchDownload

From bb9d992f06674293a9657a0772c361573fef4c12 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 20:47:31 +0200
Subject: [PATCH 03/10] wip: introduce debug_brin_cross_check

---
 src/backend/access/brin/brin_minmax.c | 574 ++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c   |  10 +
 2 files changed, 584 insertions(+)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 6e20a34fbc..d08b8e0e63 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -29,6 +29,7 @@
 
 #ifdef DEBUG_BRIN_STATS
 bool debug_brin_stats = false;
+bool debug_brin_cross_check = false;
 #endif
 
 typedef struct MinmaxOpaque
@@ -339,6 +340,50 @@ range_values_cmp(const void *a, const void *b, void *arg)
 	return DatumGetInt32(c);
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * maxval_start
+ *		Determine first index so that (maxvalue >= value).
+ *
+ * The array of ranges is expected to be sorted by maxvalue, so this is the first
+ * range that can possibly intersect with range having "value" as minval.
+ */
+static int
+maxval_start(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[start]->max_value, typcache) <= 0)
+		return 0;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[end]->max_value, typcache) > 0)
+		return nranges;
+
+	while ((end - start) > 0)
+	{
+		int	 midpoint;
+		int	 r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->max_value, typcache);
+
+		if (r <= 0)
+			end = midpoint;
+		else
+			start = (midpoint + 1);
+	}
+
+	Assert(ranges[start]->max_value >= value);
+	Assert(ranges[start-1]->max_value < value);
+
+	return start;
+}
+#endif
+
 /*
  * minval_end
  *		Determine first index so that (minval > value).
@@ -616,6 +661,316 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * Simple histogram, with bins tracking value and two overlap counts.
+ *
+ * XXX Maybe we should have two separate histograms, one for all counts and
+ * another one for "unique" values.
+ *
+ * XXX Serialize the histogram. There might be a data set where we have very
+ * many distinct buckets (values having very different number of matching
+ * ranges) - not sure if there's some sort of upper limit (but hard to say for
+ * other opclasses, like bloom). And we don't want arbitrarily large histogram,
+ * to keep the statistics fairly small, I guess. So we'd need to pick a subset,
+ * merge buckets with "similar" counts, or approximate it somehow. For now we
+ * don't serialize it, because we don't use the histogram.
+ */
+typedef struct histogram_bin_t
+{
+	int64	value;
+	int64	count;
+} histogram_bin_t;
+
+typedef struct histogram_t
+{
+	int				nbins;
+	int				nbins_max;
+	histogram_bin_t	bins[FLEXIBLE_ARRAY_MEMBER];
+} histogram_t;
+
+#define HISTOGRAM_BINS_START 32
+
+/* allocate histogram with default number of bins */
+static histogram_t *
+histogram_init(void)
+{
+	histogram_t *hist;
+
+	hist = (histogram_t *) palloc0(offsetof(histogram_t, bins) +
+								   sizeof(histogram_bin_t) * HISTOGRAM_BINS_START);
+	hist->nbins_max = HISTOGRAM_BINS_START;
+
+	return hist;
+}
+
+/*
+ * histogram_add
+ *			 Add a hit for a particular value to the histogram.
+ *
+ * XXX We don't sort the bins, so just do binary sort. For large number of values
+ * this might be an issue, for small number of values a linear search is fine.
+ */
+static histogram_t *
+histogram_add(histogram_t *hist, int value)
+{
+	bool	found = false;
+	histogram_bin_t *bin;
+
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		if (hist->bins[i].value == value)
+		{
+			bin = &hist->bins[i];
+			found = true;
+		}
+	}
+
+	if (!found)
+	{
+		if (hist->nbins == hist->nbins_max)
+		{
+			int		nbins = (2 * hist->nbins_max);
+
+			hist = repalloc(hist, offsetof(histogram_t, bins) +
+							sizeof(histogram_bin_t) * nbins);
+			hist->nbins_max = nbins;
+		}
+
+		Assert(hist->nbins < hist->nbins_max);
+
+		bin = &hist->bins[hist->nbins++];
+		bin->value = value;
+		bin->count = 0;
+	}
+
+	bin->count += 1;
+
+	Assert(bin->value == value);
+	Assert(bin->count >= 0);
+
+	return hist;
+}
+
+/* used to sort histogram bins by value */
+static int
+histogram_bin_cmp(const void *a, const void *b)
+{
+	histogram_bin_t *ba = (histogram_bin_t *) a;
+	histogram_bin_t *bb = (histogram_bin_t *) b;
+
+	if (ba->value < bb->value)
+		return -1;
+
+	if (bb->value < ba->value)
+		return 1;
+
+	return 0;
+}
+
+static void
+histogram_print(histogram_t *hist)
+{
+	return;
+
+	elog(WARNING, "----- histogram -----");
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		elog(WARNING, "bin %d value %ld count %ld",
+				i, hist->bins[i].value, hist->bins[i].count);
+	}
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges2
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Match sample tuples to the ranges, so that we can count how many ranges
+ * a value matches on average. This might seem redundant to the number of
+ * overlaps, because the value is ~avg_overlaps/2.
+ *
+ * Imagine ranges arranged in "shifted" uniformly by 1/overlaps, e.g. with 3
+ * overlaps [0,100], [33,133], [66, 166] and so on. A random value will hit
+ * only half of there ranges, thus 1/2. This can be extended to randomly
+ * overlapping ranges.
+ *
+ * However, we may not be able to count overlaps for some opclasses (e.g. for
+ * bloom ranges), in which case we have at least this.
+ *
+ * This simply walks the values, and determines matching ranges by looking
+ * for lower/upper bound in ranges ordered by minval/maxval.
+ *
+ * XXX The other question is what to do about duplicate values. If we have a
+ * very frequent value in the sample, it's likely in many places/ranges. Which
+ * will skew the average, because it'll be added repeatedly. So we also count
+ * avg_ranges for unique values.
+ *
+ * XXX The relationship that (average_matches ~ average_overlaps/2) only
+ * works for minmax opclass, and can't be extended to minmax-multi. The
+ * overlaps can only consider the two extreme values (essentially treating
+ * the summary as a single minmax range), because that's what brinsort
+ * needs. But the minmax-multi range may have "gaps" (kinda the whole point
+ * of these opclasses), which affects matching tuples to ranges.
+ *
+ * XXX This also builds histograms of the number of matches, both for the
+ * raw and unique values. At the moment we don't do anything with the
+ * results, though (except for printing those).
+ */
+static void
+brin_minmax_match_tuples_to_ranges2(BrinRanges *ranges,
+									BrinRange **minranges, BrinRange **maxranges,
+									int numrows, HeapTuple *rows,
+									int nvalues, Datum *values,
+									TypeCacheEntry *typcache,
+									BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nmatches_value = 0;
+	int64	nvalues_unique = 0;
+
+	histogram_t	   *hist = histogram_init();
+	histogram_t	   *hist_unique = histogram_init();
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		int		start;
+		int		end;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Same value as preceding, so just use the preceding count.
+		 * We don't increment the unique counters, because this is
+		 * a duplicate.
+		 */
+		if ((i > 0) && (range_values_cmp(&values[i-1], &values[i], typcache) == 0))
+		{
+			nmatches += nmatches_value;
+			hist = histogram_add(hist, nmatches_value);
+			continue;
+		}
+
+		nmatches_value = 0;
+
+		start = maxval_start(maxranges, ranges->nranges, values[i], typcache);
+		end = minval_end(minranges, ranges->nranges, values[i], typcache);
+
+		for (int j = start; j < ranges->nranges; j++)
+		{
+			if (maxranges[j]->min_index >= end)
+				continue;
+
+			if (maxranges[j]->min_index_lowest >= end)
+				break;
+
+			nmatches_value++;
+		}
+
+		hist = histogram_add(hist, nmatches_value);
+		hist_unique = histogram_add(hist_unique, nmatches_value);
+
+		nmatches += nmatches_value;
+		nmatches_unique += nmatches_value;
+		nvalues_unique++;
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges2 -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f",
+			 nmatches_unique, nvalues_unique, (double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_matches != (double) nmatches / numrows)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges2: avg_matches mismatch %f != %f",
+			 stats->avg_matches, (double) nmatches / numrows);
+
+	if (stats->avg_matches_unique != (double) nmatches_unique / nvalues_unique)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges2: avg_matches_unique mismatch %f != %f",
+			 stats->avg_matches_unique, (double) nmatches_unique / nvalues_unique);
+
+	pg_qsort(hist->bins, hist->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+	pg_qsort(hist_unique->bins, hist_unique->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+
+	histogram_print(hist);
+	histogram_print(hist_unique);
+
+	pfree(hist);
+	pfree(hist_unique);
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges_bruteforce
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Bruteforce approach, used mostly for cross-checking.
+ */
+static void
+brin_minmax_match_tuples_to_ranges_bruteforce(BrinRanges *ranges,
+											  int numrows, HeapTuple *rows,
+											  int nvalues, Datum *values,
+											  TypeCacheEntry *typcache,
+											  BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nvalues_unique = 0;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		bool	is_unique;
+		int64	nmatches_value = 0;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* is this a new value? */
+		is_unique = ((i == 0) || (range_values_cmp(&values[i-1], &values[i], typcache) != 0));
+
+		/* count unique values */
+		nvalues_unique += (is_unique) ? 1 : 0;
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			if (range_values_cmp(&values[i], &ranges->ranges[j].min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&values[i], &ranges->ranges[j].max_value, typcache) > 0)
+				continue;
+
+			nmatches_value++;
+		}
+
+		nmatches += nmatches_value;
+		nmatches_unique += (is_unique) ? nmatches_value : 0;
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges_bruteforce -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f", nmatches_unique, nvalues_unique,
+			 (double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_matches != (double) nmatches / numrows)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges_bruteforce: avg_matches mismatch %f != %f",
+			 stats->avg_matches, (double) nmatches / numrows);
+
+	if (stats->avg_matches_unique != (double) nmatches_unique / nvalues_unique)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges_bruteforce: avg_matches_unique mismatch %f != %f",
+			 stats->avg_matches_unique, (double) nmatches_unique / nvalues_unique);
+}
+#endif
+
 /*
  * brin_minmax_value_stats
  *		Calculate statistics about minval/maxval values.
@@ -787,6 +1142,198 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 	stats->maxval_increment_max = max_maxval;
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * brin_minmax_count_overlaps2
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges/maxranges to quickly eliminate ranges that can't
+ * possibly intersect.
+ *
+ * XXX Seems rather complicated and works poorly for wide ranges (with outlier
+ * values), brin_minmax_count_overlaps is likely better.
+ */
+static void
+brin_minmax_count_overlaps2(BrinRanges *ranges,
+						   BrinRange **minranges, BrinRange **maxranges,
+						   TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	int64			noverlaps;
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Walk the ranges ordered by max_values, see how many ranges overlap.
+	 *
+	 * Once we get to a state where (min_value > current.max_value) for
+	 * all future ranges, we know none of them can overlap and we can
+	 * terminate. This is what min_index_lowest is for.
+	 *
+	 * XXX If there are very wide ranges (with outlier min/max values),
+	 * the min_index_lowest is going to be pretty useless, because the
+	 * range will be sorted at the very end by max_value, but will have
+	 * very low min_index, so this won't work.
+	 *
+	 * XXX We could collect a more elaborate stuff, like for example a
+	 * histogram of number of overlaps, or maximum number of overlaps.
+	 * So we'd have average, but then also an info if there are some
+	 * ranges with very many overlaps.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int			idx = (i + 1);
+		BrinRange *ra = maxranges[i];
+		uint64		min_index = ra->min_index;
+
+		CHECK_FOR_INTERRUPTS();
+
+#ifdef NOT_USED
+		/*
+		 * XXX Not needed, we can just count "future" ranges and then
+		 * we just multiply by 2.
+		 */
+
+		/*
+		 * What's the first range that might overlap with this one?
+		 * needs to have maxval > current.minval.
+		 */
+		while (idx > 0)
+		{
+			BrinRange *rb = maxranges[idx - 1];
+
+			/* the range is before the current one, so can't intersect */
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				break;
+
+			idx--;
+		}
+#endif
+
+		/*
+		 * Find the first min_index that is higher than the max_value,
+		 * so that we can compare that instead of the values in the
+		 * next loop. There should be fewer value comparisons than in
+		 * the next loop, so we'll save on function calls.
+		 */
+		while (min_index < ranges->nranges)
+		{
+			if (range_values_cmp(&minranges[min_index]->min_value,
+								 &ra->max_value, typcache) > 0)
+				break;
+
+			min_index++;
+		}
+
+		/*
+		 * Walk the following ranges (ordered by max_value), and check
+		 * if it overlaps. If it matches, we look at the next one. If
+		 * not, we check if there can be more ranges.
+		 */
+		for (int j = idx; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = maxranges[j];
+
+			/* the range overlaps - just continue with the next one */
+			// if (range_values_cmp(&rb->min_value, &ra->max_value, typcache) <= 0)
+			if (rb->min_index < min_index)
+			{
+				noverlaps++;
+				continue;
+			}
+
+			/*
+			 * Are there any future ranges that might overlap? We can
+			 * check the min_index_lowest to decide quickly.
+			 */
+			 if (rb->min_index_lowest >= min_index)
+					break;
+		}
+	}
+
+	/*
+	 * We only count intersect for "following" ranges when ordered by maxval,
+	 * so we only see 1/2 the overlaps. So double the result.
+	 */
+	noverlaps *= 2;
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps2 -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_overlaps != (double) noverlaps / ranges->nranges)
+		elog(ERROR, "brin_minmax_count_overlaps2: mismatch %f != %f",
+			 stats->avg_overlaps, (double) noverlaps / ranges->nranges);
+}
+
+/*
+ * brin_minmax_count_overlaps_bruteforce
+ *		Calculate number of overlaps by brute force.
+ *
+ * Actually compares every range to every other range. Quite expensive, used
+ * primarily to cross-check the other algorithms.
+ */
+static void
+brin_minmax_count_overlaps_bruteforce(BrinRanges *ranges,
+									  TypeCacheEntry *typcache,
+									  BrinMinmaxStats *stats)
+{
+	int64			noverlaps;
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Brute force calculation of overlapping ranges, comparing each
+	 * range to every other range - bound to be pretty expensive, as
+	 * it's pretty much O(N^2). Kept mostly for easy cross-check with
+	 * the preceding "optimized" code.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		BrinRange *ra = &ranges->ranges[i];
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = &ranges->ranges[j];
+
+			CHECK_FOR_INTERRUPTS();
+
+			if (i == j)
+				continue;
+
+			if (range_values_cmp(&ra->max_value, &rb->min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				continue;
+
+#if 0
+			elog(DEBUG1, "[%ld,%ld] overlaps [%ld,%ld]",
+				 ra->min_value, ra->max_value,
+				 rb->min_value, rb->max_value);
+#endif
+
+			noverlaps++;
+		}
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps_bruteforce -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_overlaps != (double) noverlaps / ranges->nranges)
+		elog(ERROR, "brin_minmax_count_overlaps2: mismatch %f != %f",
+			 stats->avg_overlaps, (double) noverlaps / ranges->nranges);
+}
+#endif
+
 /*
  * brin_minmax_stats
  *		Calculate custom statistics for a BRIN minmax index.
@@ -798,6 +1345,11 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
  *  - average number of rows matching a range
  *  - number of distinct minval/maxval values
  *
+ * There are multiple ways to calculate some of the metrics, so to allow
+ * cross-checking during development it's possible to run and compare all.
+ * To do that, define STATS_CROSS_CHECK. There's also STATS_DEBUG define
+ * that simply prints the calculated results.
+ *
  * XXX This could also calculate correlation of the range minval, so that
  * we can estimate how much random I/O will happen during the BrinSort.
  * And perhaps we should also sort the ranges by (minval,block_start) to
@@ -1106,6 +1658,14 @@ brin_minmax_stats(PG_FUNCTION_ARGS)
 	/* calculate average number of overlapping ranges for any range */
 	brin_minmax_count_overlaps(minranges, ranges->nranges, typcache, stats);
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_cross_check)
+	{
+		brin_minmax_count_overlaps2(ranges, minranges, maxranges, typcache, stats);
+		brin_minmax_count_overlaps_bruteforce(ranges, typcache, stats);
+	}
+#endif
+
 	/* calculate minval/maxval stats (distinct values and correlation) */
 	brin_minmax_value_stats(minranges, maxranges,
 							ranges->nranges, typcache, stats);
@@ -1134,6 +1694,20 @@ brin_minmax_stats(PG_FUNCTION_ARGS)
 										   numrows, rows, nvalues, values,
 										   typcache, stats);
 
+#ifdef DEBUG_BRIN_STATS
+		if (debug_brin_cross_check)
+		{
+			brin_minmax_match_tuples_to_ranges2(ranges, minranges, maxranges,
+												numrows, rows, nvalues, values,
+												typcache, stats);
+
+			brin_minmax_match_tuples_to_ranges_bruteforce(ranges,
+														  numrows, rows,
+														  nvalues, values,
+														  typcache, stats);
+		}
+#endif
+
 		brin_minmax_increment_stats(minranges, maxranges, ranges->nranges,
 									values, nvalues, typcache, stats);
 	}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 498e1af518..c4903e2e3e 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -98,6 +98,7 @@ extern bool synchronize_seqscans;
 
 #ifdef DEBUG_BRIN_STATS
 extern bool debug_brin_stats;
+extern bool debug_brin_cross_check;
 #endif
 
 #ifdef TRACE_SYNCSCAN
@@ -1246,6 +1247,15 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"debug_brin_cross_check", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Cross-check calculation of BRIN statistics."),
+			NULL
+		},
+		&debug_brin_cross_check,
+		false,
+		NULL, NULL, NULL
+	},
 #endif
 
 	{
-- 
2.39.0

0004-Allow-BRIN-indexes-to-produce-sorted-output-20230206.patchtext/x-patch; charset=UTF-8; name=0004-Allow-BRIN-indexes-to-produce-sorted-output-20230206.patchDownload

From 1e8dd2550eef7b5daedb9c61dca9511f886421f9 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 9 Oct 2022 11:33:37 +0200
Subject: [PATCH 04/10] Allow BRIN indexes to produce sorted output

Some BRIN indexes can be used to produce sorted output, by using the
range information to sort tuples incrementally. This is particularly
interesting for LIMIT queries, which only need to scan the first few
rows, and alternative plans (e.g. Seq Scan + Sort) have a very high
startup cost.

Of course, if there are e.g. BTREE indexes this is going to be slower,
but people are unlikely to have both index types on the same column.

This is disabled by default, use enable_brinsort GUC to enable it.
---
 src/backend/access/brin/brin_minmax.c         |  402 ++++
 src/backend/commands/explain.c                |   44 +
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/meson.build              |    1 +
 src/backend/executor/nodeBrinSort.c           | 1661 +++++++++++++++++
 src/backend/optimizer/path/costsize.c         |  254 +++
 src/backend/optimizer/path/indxpath.c         |  183 ++
 src/backend/optimizer/path/pathkeys.c         |   49 +
 src/backend/optimizer/plan/createplan.c       |  189 ++
 src/backend/optimizer/plan/setrefs.c          |   19 +
 src/backend/optimizer/util/pathnode.c         |   57 +
 src/backend/utils/misc/guc_tables.c           |   28 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/backend/utils/sort/tuplesort.c            |   12 +
 src/include/access/brin.h                     |   35 -
 src/include/access/brin_internal.h            |    1 +
 src/include/catalog/pg_amproc.dat             |   64 +
 src/include/catalog/pg_proc.dat               |    5 +-
 src/include/executor/nodeBrinSort.h           |   47 +
 src/include/nodes/execnodes.h                 |  108 ++
 src/include/nodes/pathnodes.h                 |   11 +
 src/include/nodes/plannodes.h                 |   26 +
 src/include/optimizer/cost.h                  |    3 +
 src/include/optimizer/pathnode.h              |    9 +
 src/include/optimizer/paths.h                 |    3 +
 src/include/utils/tuplesort.h                 |    1 +
 src/test/regress/expected/sysviews.out        |    3 +-
 28 files changed, 3190 insertions(+), 37 deletions(-)
 create mode 100644 src/backend/executor/nodeBrinSort.c
 create mode 100644 src/include/executor/nodeBrinSort.h

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index d08b8e0e63..987379c911 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -16,6 +16,10 @@
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
 #include "miscadmin.h"
@@ -42,6 +46,9 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
 
+/* print info about ranges */
+#define BRINSORT_DEBUG
+
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1730,6 +1737,401 @@ cleanup:
 	PG_RETURN_POINTER(stats);
 }
 
+/*
+ * brin_minmax_range_tupdesc
+ *		Create a tuple descriptor to store BrinRange data.
+ */
+static TupleDesc
+brin_minmax_range_tupdesc(BrinDesc *brdesc, AttrNumber attnum)
+{
+	TupleDesc	tupdesc;
+	AttrNumber	attno = 1;
+
+	/* expect minimum and maximum */
+	Assert(brdesc->bd_info[attnum - 1]->oi_nstored == 2);
+
+	tupdesc = CreateTemplateTupleDesc(7);
+
+	/* blkno_start */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT8OID, -1, 0);
+
+	/* blkno_end (could be calculated as blkno_start + pages_per_range) */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT8OID, -1, 0);
+
+	/* has_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* all_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* not_summarized */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* min_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	/* max_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	return tupdesc;
+}
+
+/*
+ * brin_minmax_scan_init
+ *		Prepare the BrinRangeScanDesc including the sorting info etc.
+ *
+ * We want to have the ranges in roughly this order
+ *
+ * - not-summarized
+ * - summarized, non-null values
+ * - summarized, all-nulls
+ *
+ * We do it this way, because the not-summarized ranges need to be
+ * scanned always (both to produce NULL and non-NULL values), and
+ * we need to read all of them into the tuplesort before producing
+ * anything. So placing them at the beginning is reasonable.
+ *
+ * The all-nulls ranges are placed last, because when processing
+ * NULLs we need to scan everything anyway (some of the ranges might
+ * have has_nulls=true). But for non-NULL values we can abort once
+ * we hit the first all-nulls range.
+ *
+ * The regular ranges are sorted by blkno_start, to make it maybe
+ * a bit more sequential (but this only helps if there are ranges
+ * with the same minval).
+ */
+static BrinRangeScanDesc *
+brin_minmax_scan_init(BrinDesc *bdesc, Oid collation, AttrNumber attnum, bool asc)
+{
+	BrinRangeScanDesc  *scan;
+
+	/* sort by (not_summarized, minval, blkno_start, all_nulls) */
+	AttrNumber			keys[4];
+	Oid					collations[4];
+	bool				nullsFirst[4];
+	Oid					operators[4];
+	Oid					typid;
+	TypeCacheEntry	   *typcache;
+
+	/* we expect to have min/max value for each range, same type for both */
+	Assert(bdesc->bd_info[attnum - 1]->oi_nstored == 2);
+	Assert(bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id ==
+		   bdesc->bd_info[attnum - 1]->oi_typcache[1]->type_id);
+
+	scan = (BrinRangeScanDesc *) palloc0(sizeof(BrinRangeScanDesc));
+
+	/* build tuple descriptor for range data */
+	scan->tdesc = brin_minmax_range_tupdesc(bdesc, attnum);
+
+	/* initialize ordering info */
+	keys[0] = 5;				/* not_summarized */
+	keys[1] = 4;				/* all_nulls */
+	keys[2] = (asc) ? 6 : 7;	/* min_value (asc) or max_value (desc) */
+	keys[3] = 1;				/* blkno_start */
+
+	collations[0] = InvalidOid;	/* FIXME */
+	collations[1] = InvalidOid;	/* FIXME */
+	collations[2] = collation;	/* FIXME */
+	collations[3] = InvalidOid;	/* FIXME */
+
+	/* unrelated to the ordering desired by the user */
+	nullsFirst[0] = false;
+	nullsFirst[1] = false;
+	nullsFirst[2] = false;
+	nullsFirst[3] = false;
+
+	/* lookup sort operator for the boolean type (used for not_summarized) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_GT_OPR);
+	operators[0] = typcache->gt_opr;
+
+	/* lookup sort operator for the boolean type (used for all_nulls) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_LT_OPR);
+	operators[1] = typcache->lt_opr;
+
+	/* lookup sort operator for the min/max type */
+	typid = bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id;
+	typcache = lookup_type_cache(typid, TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+	operators[2] = (asc) ? typcache->lt_opr : typcache->gt_opr;
+
+	/* lookup sort operator for the bigint type (used for blkno_start) */
+	typcache = lookup_type_cache(INT8OID, TYPECACHE_LT_OPR);
+	operators[3] = typcache->lt_opr;
+
+	/*
+	 * XXX better to keep this small enough to fit into L2/L3, large values
+	 * of work_mem may easily make this slower.
+	 */
+	scan->ranges = tuplesort_begin_heap(scan->tdesc,
+										4, /* nkeys */
+										keys,
+										operators,
+										collations,
+										nullsFirst,
+										work_mem,
+										NULL,
+										TUPLESORT_RANDOMACCESS);
+
+	scan->slot = MakeSingleTupleTableSlot(scan->tdesc,
+										  &TTSOpsMinimalTuple);
+
+	return scan;
+}
+
+/*
+ * brin_minmax_scan_add_tuple
+ *		Form and store a tuple representing the BRIN range to the tuplestore.
+ */
+static void
+brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan, TupleTableSlot *slot,
+						   BlockNumber block_start, BlockNumber block_end,
+						   bool has_nulls, bool all_nulls, bool not_summarized,
+						   Datum min_value, Datum max_value)
+{
+	ExecClearTuple(slot);
+
+	memset(slot->tts_isnull, false, 7 * sizeof(bool));
+
+	slot->tts_values[0] = Int64GetDatum(block_start);
+	slot->tts_values[1] = Int64GetDatum(block_end);
+	slot->tts_values[2] = BoolGetDatum(has_nulls);
+	slot->tts_values[3] = BoolGetDatum(all_nulls);
+	slot->tts_values[4] = BoolGetDatum(not_summarized);
+	slot->tts_values[5] = min_value;
+	slot->tts_values[6] = max_value;
+
+	if (all_nulls || not_summarized)
+	{
+		slot->tts_isnull[5] = true;
+		slot->tts_isnull[6] = true;
+	}
+
+	ExecStoreVirtualTuple(slot);
+
+	tuplesort_puttupleslot(scan->ranges, slot);
+
+	scan->nranges++;
+}
+
+#ifdef BRINSORT_DEBUG
+/*
+ * brin_minmax_scan_next
+ *		Return the next BRIN range information from the tuplestore.
+ *
+ * Returns NULL when there are no more ranges.
+ */
+static BrinRange *
+brin_minmax_scan_next(BrinRangeScanDesc *scan)
+{
+	if (tuplesort_gettupleslot(scan->ranges, true, false, scan->slot, NULL))
+	{
+		bool		isnull;
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = slot_getattr(scan->slot, 1, &isnull);
+		range->blkno_end = slot_getattr(scan->slot, 2, &isnull);
+		range->has_nulls = slot_getattr(scan->slot, 3, &isnull);
+		range->all_nulls = slot_getattr(scan->slot, 4, &isnull);
+		range->not_summarized = slot_getattr(scan->slot, 5, &isnull);
+		range->min_value = slot_getattr(scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(scan->slot, 7, &isnull);
+
+		return range;
+	}
+
+	return NULL;
+}
+
+/*
+ * brin_minmax_scan_dump
+ *		Print info about all page ranges stored in the tuplestore.
+ */
+static void
+brin_minmax_scan_dump(BrinRangeScanDesc *scan)
+{
+	BrinRange *range;
+
+	if (!message_level_is_interesting(WARNING))
+		return;
+
+	elog(WARNING, "===== dumping =====");
+	while ((range = brin_minmax_scan_next(scan)) != NULL)
+	{
+		elog(WARNING, "[%u %u] has_nulls %d all_nulls %d not_summarized %d values [%ld %ld]",
+			 range->blkno_start, range->blkno_end,
+			 range->has_nulls, range->all_nulls, range->not_summarized,
+			 range->min_value, range->max_value);
+
+		pfree(range);
+	}
+
+	/* reset the tuplestore, so that we can start scanning again */
+	tuplesort_rescan(scan->ranges);
+}
+#endif
+
+static void
+brin_minmax_scan_finalize(BrinRangeScanDesc *scan)
+{
+	tuplesort_performsort(scan->ranges);
+}
+
+/*
+ * brin_minmax_ranges
+ *		Load the BRIN ranges and sort them.
+ */
+Datum
+brin_minmax_ranges(PG_FUNCTION_ARGS)
+{
+	IndexScanDesc	scan = (IndexScanDesc) PG_GETARG_POINTER(0);
+	AttrNumber		attnum = PG_GETARG_INT16(1);
+	bool			asc = PG_GETARG_BOOL(2);
+	Oid				colloid = PG_GET_COLLATION();
+	BrinOpaque *opaque;
+	Relation	indexRel;
+	Relation	heapRel;
+	BlockNumber nblocks;
+	BlockNumber	heapBlk;
+	Oid			heapOid;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinRangeScanDesc *brscan;
+	TupleTableSlot *slot;
+
+	/*
+	 * Determine how many BRIN ranges could there be, allocate space and read
+	 * all the min/max values.
+	 */
+	opaque = (BrinOpaque *) scan->opaque;
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	indexRel = bdesc->bd_index;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	heapOid = IndexGetRelation(RelationGetRelid(indexRel), false);
+	heapRel = table_open(heapOid, AccessShareLock);
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+	table_close(heapRel, AccessShareLock);
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* initialize the scan describing scan of ranges sorted by minval */
+	brscan = brin_minmax_scan_init(bdesc, colloid, attnum, asc);
+
+	slot = MakeSingleTupleTableSlot(brscan->tdesc, &TTSOpsVirtual);
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX The sort may be quite expensive, e.g. for small BRIN ranges. Maybe
+	 * we could optimize this somehow? For example, we know the not-summarized
+	 * ranges are always going to be first, and all-null ranges last, so maybe
+	 * we could stash those somewhere, and not sort them? But there are likely
+	 * only very few such ranges, in most cases. Moreover, how would we then
+	 * prepend/append those ranges to the sorted ones? Probably would have to
+	 * store them in a tuplestore, or something.
+	 *
+	 * XXX Seems that having large work_mem can be quite detrimental, because
+	 * then it overflows L2/L3 caches, making the sort much slower.
+	 *
+	 * XXX If there are other indexes, would be great to filter the ranges, so
+	 * that we only sort the interesting ones - reduces the number of ranges,
+	 * makes the sort faster.
+	 *
+	 * XXX Another option is making this incremental - e.g. only ask for the
+	 * first 1000 ranges, using a top-N sort. And then if it's not enough we
+	 * could request another chunk. But the second request would have to be
+	 * rather unlikely (because quite expensive), and the top-N sort does not
+	 * seem all that faster (as long as we don't overflow L2/L3).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   scan->xs_snapshot);
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/*
+		 * Ranges with no indexed tuple may contain anything.
+		 */
+		if (!gottuple)
+		{
+			brin_minmax_scan_add_tuple(brscan, slot,
+									   heapBlk, heapBlk + (pagesPerRange - 1),
+									   false, false, true, 0, 0);
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/*
+				 * Placeholder tuples are treated as if not summarized.
+				 *
+				 * XXX Is this correct?
+				 */
+				brin_minmax_scan_add_tuple(brscan, slot,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   false, false, true, 0, 0);
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				brin_minmax_scan_add_tuple(brscan, slot,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   bval->bv_hasnulls, bval->bv_allnulls, false,
+										   bval->bv_values[0], bval->bv_values[1]);
+			}
+		}
+	}
+
+	ExecDropSingleTupleTableSlot(slot);
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* do the sort and any necessary post-processing */
+	brin_minmax_scan_finalize(brscan);
+
+#ifdef BRINSORT_DEBUG
+	brin_minmax_scan_dump(brscan);
+#endif
+
+	PG_RETURN_POINTER(brscan);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index fbbf28cf06..ca1c625a01 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -85,6 +85,8 @@ static void show_sort_keys(SortState *sortstate, List *ancestors,
 						   ExplainState *es);
 static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
+static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
+							   ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1100,6 +1102,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexScan:
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
+		case T_BrinSort:
 		case T_TidScan:
 		case T_TidRangeScan:
 		case T_SubqueryScan:
@@ -1262,6 +1265,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_IndexOnlyScan:
 			pname = sname = "Index Only Scan";
 			break;
+		case T_BrinSort:
+			pname = sname = "BRIN Sort";
+			break;
 		case T_BitmapIndexScan:
 			pname = sname = "Bitmap Index Scan";
 			break;
@@ -1508,6 +1514,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainScanTarget((Scan *) indexonlyscan, es);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *brinsort = (BrinSort *) plan;
+
+				ExplainIndexScanDetails(brinsort->indexid,
+										brinsort->indexorderdir,
+										es);
+				ExplainScanTarget((Scan *) brinsort, es);
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *bitmapindexscan = (BitmapIndexScan *) plan;
@@ -1790,6 +1806,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainPropertyFloat("Heap Fetches", NULL,
 									 planstate->instrument->ntuples2, 0, es);
 			break;
+		case T_BrinSort:
+			show_scan_qual(((BrinSort *) plan)->indexqualorig,
+						   "Index Cond", planstate, ancestors, es);
+			if (((BrinSort *) plan)->indexqualorig)
+				show_instrumentation_count("Rows Removed by Index Recheck", 2,
+										   planstate, es);
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			break;
 		case T_BitmapIndexScan:
 			show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
@@ -2389,6 +2417,21 @@ show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 						 ancestors, es);
 }
 
+/*
+ * Show the sort keys for a BRIN Sort node.
+ */
+static void
+show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSort	   *plan = (BrinSort *) sortstate->ss.ps.plan;
+
+	show_sort_group_keys((PlanState *) sortstate, "Sort Key",
+						 plan->numCols, 0, plan->sortColIdx,
+						 plan->sortOperators, plan->collations,
+						 plan->nullsFirst,
+						 ancestors, es);
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
@@ -3809,6 +3852,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
+		case T_BrinSort:
 			/* Assert it's on a real relation */
 			Assert(rte->rtekind == RTE_RELATION);
 			objectname = get_rel_name(rte->relid);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 11118d0ce0..bcaa2ce8e2 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -38,6 +38,7 @@ OBJS = \
 	nodeBitmapHeapscan.o \
 	nodeBitmapIndexscan.o \
 	nodeBitmapOr.o \
+	nodeBrinSort.o \
 	nodeCtescan.o \
 	nodeCustom.o \
 	nodeForeignscan.o \
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..93d1007809 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -79,6 +79,7 @@
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
+#include "executor/nodeBrinSort.h"
 #include "executor/nodeCtescan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -226,6 +227,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														 estate, eflags);
 			break;
 
+		case T_BrinSort:
+			result = (PlanState *) ExecInitBrinSort((BrinSort *) node,
+													estate, eflags);
+			break;
+
 		case T_BitmapIndexScan:
 			result = (PlanState *) ExecInitBitmapIndexScan((BitmapIndexScan *) node,
 														   estate, eflags);
@@ -639,6 +645,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIndexOnlyScan((IndexOnlyScanState *) node);
 			break;
 
+		case T_BrinSortState:
+			ExecEndBrinSort((BrinSortState *) node);
+			break;
+
 		case T_BitmapIndexScanState:
 			ExecEndBitmapIndexScan((BitmapIndexScanState *) node);
 			break;
diff --git a/src/backend/executor/meson.build b/src/backend/executor/meson.build
index 65f9457c9b..ed7f38a139 100644
--- a/src/backend/executor/meson.build
+++ b/src/backend/executor/meson.build
@@ -26,6 +26,7 @@ backend_sources += files(
   'nodeBitmapHeapscan.c',
   'nodeBitmapIndexscan.c',
   'nodeBitmapOr.c',
+  'nodeBrinSort.c',
   'nodeCtescan.c',
   'nodeCustom.c',
   'nodeForeignscan.c',
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
new file mode 100644
index 0000000000..5225e64756
--- /dev/null
+++ b/src/backend/executor/nodeBrinSort.c
@@ -0,0 +1,1661 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.c
+ *	  Routines to support sorted scan of relations using a BRIN index
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * The overall algorithm is roughly this:
+ *
+ * 0) initialize a tuplestore and a tuplesort
+ *
+ * 1) fetch list of page ranges from the BRIN index, sorted by minval
+ *    (with the not-summarized ranges first, and all-null ranges last)
+ *
+ * 2) for NULLS FIRST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ * 3) while there are ranges in the list, do this:
+ *
+ *   a) get next (distinct) minval from the list, call it watermark
+ *
+ *   b) if there are any tuples in the tuplestore, move them to tuplesort
+ *
+ *   c) process all ranges with (minval < watermark) - read tuples and feed
+ *      them either into tuplestore (when value < watermark) or tuplestore
+ *
+ *   d) sort the tuplestore, output all the tuples
+ *
+ * 4) if some tuples remain in the tuplestore, sort and output them
+ *
+ * 5) for NULLS LAST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ *
+ * For DESC orderings the process is almost the same, except that we look
+ * at maxval and use '>' operator (but that's transparent).
+ *
+ * There's a couple possible things that might be done in different ways:
+ *
+ * 1) Not using tuplestore, and feeding tuples only to a tuplesort. Then
+ * while producing the tuples, we'd only output tuples up to the current
+ * watermark, and then we'd keep the remaining tuples for the next round.
+ * Either we'd need to transfer them into a second tuplesort, or allow
+ * "reopening" the tuplesort and adding more tuples. And then only the
+ * part since the watermark would get sorted (possibly using a merge-sort
+ * with the already sorted part).
+ *
+ *
+ * 2) The other question is what to do with NULL values - at the moment we
+ * just read the ranges, output the NULL tuples and that's it - we're not
+ * retaining any non-NULL tuples, so that we'll read the ranges again in
+ * the second range. The logic here is that either there are very few
+ * such ranges, so it's won't cost much to just re-read them. Or maybe
+ * there are very many such ranges, and we'd do a lot of spilling to the
+ * tuplestore, and it's not much more expensive to just re-read the source
+ * data. There are counter-examples, though - e.g., there might be many
+ * has_nulls ranges, but with very few non-NULL tuples. In this case it
+ * might be better to actually spill the tuples instead of re-reading all
+ * the ranges. Maybe this is something we can do at run-time, or maybe we
+ * could estimate this at planning time. We do know the null_frac for the
+ * column, so we know the number of NULL rows. And we also know the number
+ * of all_nulls and has_nulls ranges. We can estimate the number of rows
+ * per range, and we can estimate how many non-NULL rows are in the
+ * has_nulls ranges (we don't need to re-read all-nulls ranges). There's
+ * also the filter, which may reduce the amount of rows to store.
+ *
+ * So we'd need to compare two metrics calculated roughly like this:
+ *
+ *   cost(re-reading has-nulls ranges)
+ *      = cost(random_page_cost * n_has_nulls + seq_page_cost * pages_per_range)
+ *
+ *   cost(spilling non-NULL rows from has-nulls ranges)
+ *      = cost(numrows * width / BLCKSZ * seq_page_cost * 2)
+ *
+ * where numrows is the number of non-NULL rows in has_null ranges, which
+ * can be calculated like this:
+ *
+ *   // estimated number of rows in has-null ranges
+ *   rows_in_has_nulls = (reltuples / relpages) * pages_per_range * n_has_nulls
+ *
+ *   // number of NULL rows in the has-nulls ranges
+ *   nulls_in_ranges = reltuples * null_frac - n_all_nulls * (reltuples / relpages)
+ *
+ *   // numrows is the difference, multiplied by selectivity of the index
+ *   // filter condition (value between 0.0 and 1.0)
+ *   numrows = (rows_in_has_nulls - nulls_in_ranges) * selectivity
+ *
+ * This ignores non-summarized ranges, but there should be only very few of
+ * those, so it should not make a huge difference. Otherwise we can divide
+ * them between regular, has-nulls and all-nulls pages to keep the ratio.
+ *
+ *
+ * 3) How large step to make when updating the watermark?
+ *
+ * When updating the watermark, one option is to simply proceed to the next
+ * distinct minval value, which is the smallest possible step we can make.
+ * This may be both fine and very inefficient, depending on how many rows
+ * end up in the tuplesort and how many rows we end up spilling (possibly
+ * repeatedly to the tuplestore).
+ *
+ * When having to sort large number of rows, it's inefficient to run many
+ * tiny sorts, even if it produces correct result. For example when sorting
+ * 1M rows, we may split this as either (a) 100000x sorts of 10 rows, or
+ * (b) 1000 sorts of 1000 rows. The (b) option is almost certainly more
+ * efficient. Maybe sorts of 10k rows would be even better, if it fits
+ * into work_mem.
+ *
+ * This gets back to how large the page ranges are, and if/how much they
+ * overlap. With tiny ranges (e.g. a single-page ranges), a single range
+ * can only add as many rows as we can fit on a single page. So we need
+ * more ranges by default - how many watermark steps that is depends on
+ * how many distinct minval values are there ...
+ *
+ * Then there's overlaps - if ranges do not overlap, we're done and we'll
+ * add the whole range because the next watermark is above maxval. But
+ * when the ranges overlap, we'll only add the first part (assuming the
+ * minval of the next range is the watermark). Assume 10 overlapping
+ * ranges - imagine for example ranges shifted by 10%, so something like
+ *
+ *   [0,100] [10,110], [20,120], [30, 130], ..., [90, 190]
+ *
+ * In the first step we use watermark=10 and load the first range, with
+ * maybe 1000 rows in total. But assuming uniform distribution, only about
+ * 100 rows will go into the tuplesort, the remaining 900 rows will go into
+ * the tuplestore (assuming uniform distribution). Then in the second step
+ * we sort another 100 rows and the remaining 800 rows will be moved into
+ * a new tuplestore. And so on and so on.
+ *
+ * This means that incrementing the watermarks by single steps may be
+ * quite inefficient, and we need to reflect both the range size and
+ * how much the ranges overlap.
+ *
+ * In fact, maybe we should not determine the step as number of minval
+ * values to skip, but how many ranges would that mean reading. Because
+ * if we have a minval with many duplicates, that may load many rows.
+ * Or even better, we could look at how many rows would that mean loading
+ * into the tuplestore - if we track P(x<minval) for each range (e.g. by
+ * calculating average value during ANALYZE, or perhaps by estimating
+ * it from per-column stats), then we know the increment is going to be
+ * about
+ *
+ *     P(x < minval[i]) - P(x < minval[i-1])
+ *
+ * and we can stop once we'd exceed work_mem (with some slack). See comment
+ * for brin_minmax_stats() for more thoughts.
+ *
+ *
+ * 4) LIMIT/OFFSET vs. full sort
+ *
+ * There's one case where very small sorts may be actually optimal, and
+ * that's queries that need to process only very few rows - say, LIMIT
+ * queries with very small bound.
+ *
+ *
+ * FIXME handling of other brin opclasses (minmax-multi)
+ *
+ * FIXME improve costing
+ *
+ *
+ * Improvement ideas:
+ *
+ * 1) multiple tuplestores for overlapping ranges
+ *
+ * When there are many overlapping ranges (so that maxval > current.maxval),
+ * we're loading all the "future" tuples into a new tuplestore. However, if
+ * there are multiple such ranges (imagine ranges "shifting" by 10%, which
+ * gives us 9 more ranges), we know in the next round we'll only need rows
+ * until the next maxval. We'll not sort these rows, but we'll still shuffle
+ * them around until we get to the proper range (so about 10x each row).
+ * Maybe we should pre-allocate the tuplestores (or maybe even tuplesorts)
+ * for future ranges, and route the tuples to the correct one? Maybe we
+ * could be a bit smarter and discard tuples once we have enough rows for
+ * the preceding ranges (say, with LIMIT queries). We'd also need to worry
+ * about work_mem, though - we can't just use many tuplestores, each with
+ * whole work_mem. So we'd probably use e.g. work_mem/2 for the next one,
+ * and then /4, /8 etc. for the following ones. That's work_mem in total.
+ * And there'd need to be some limit on number of tuplestores, I guess.
+ *
+ * 2) handling NULL values
+ *
+ * We need to handle NULLS FIRST / NULLS LAST cases. The question is how
+ * to do that - the easiest way is to simply do a separate scan of ranges
+ * that might contain NULL values, processing just rows with NULLs, and
+ * discarding other rows. And then process non-NULL values as currently.
+ * The NULL scan would happen before/after this regular phase.
+ *
+ * Byt maybe we could be smarter, and not do separate scans. When reading
+ * a page, we might stash the tuple in a tuplestore, so that we can read
+ * it the next round. Obviously, this might be expensive if we need to
+ * keep too many rows, so the tuplestore would grow too large - in that
+ * case it might be better to just do the two scans.
+ *
+ * 3) parallelism
+ *
+ * Presumably we could do a parallel version of this. The leader or first
+ * worker would prepare the range information, and the workers would then
+ * grab ranges (in a kinda round robin manner), sort them independently,
+ * and then the results would be merged by Gather Merge.
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeBrinSort.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *		ExecBrinSort			scans a relation using an index
+ *		IndexNext				retrieve next tuple using index
+ *		ExecInitBrinSort		creates and initializes state info.
+ *		ExecReScanBrinSort		rescans the indexed relation.
+ *		ExecEndBrinSort			releases all storage.
+ *		ExecBrinSortMarkPos		marks scan position.
+ *		ExecBrinSortRestrPos	restores scan position.
+ *		ExecBrinSortEstimate	estimates DSM space needed for parallel index scan
+ *		ExecBrinSortInitializeDSM initialize DSM for parallel BrinSort
+ *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
+ *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
+ */
+#include "postgres.h"
+
+#include "access/brin.h"
+#include "access/brin_internal.h"
+#include "access/nbtree.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
+#include "executor/execdebug.h"
+#include "executor/nodeBrinSort.h"
+#include "lib/pairingheap.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/array.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+
+static TupleTableSlot *IndexNext(BrinSortState *node);
+static bool IndexRecheck(BrinSortState *node, TupleTableSlot *slot);
+static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
+
+#ifdef DEBUG_BRIN_SORT
+bool debug_brin_sort = false;
+#endif
+
+/* do various consistency checks */
+static void
+AssertCheckRanges(BrinSortState *node)
+{
+#ifdef USE_ASSERT_CHECKING
+
+#endif
+}
+
+/*
+ * brinsort_start_tidscan
+ *		Start scanning tuples from a given page range.
+ *
+ * We open a TID range scan for the given range, and initialize the tuplesort.
+ * Optionally, we update the watermark (with either high/low value). We only
+ * need to do this for the main page range, not for the intersecting ranges.
+ *
+ * XXX Maybe we should initialize the tidscan only once, and then do rescan
+ * for the following ranges? And similarly for the tuplesort?
+ */
+static void
+brinsort_start_tidscan(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate = node->ss.ps.state;
+	BrinRange  *range = node->bs_range;
+
+	/* There must not be any TID scan in progress yet. */
+	Assert(node->ss.ss_currentScanDesc == NULL);
+
+	/* Initialize the TID range scan, for the provided block range. */
+	if (node->ss.ss_currentScanDesc == NULL)
+	{
+		TableScanDesc		tscandesc;
+		ItemPointerData		mintid,
+							maxtid;
+
+		ItemPointerSetBlockNumber(&mintid, range->blkno_start);
+		ItemPointerSetOffsetNumber(&mintid, 0);
+
+		ItemPointerSetBlockNumber(&maxtid, range->blkno_end);
+		ItemPointerSetOffsetNumber(&maxtid, MaxHeapTuplesPerPage);
+
+		elog(DEBUG1, "loading range blocks [%u, %u]",
+			 range->blkno_start, range->blkno_end);
+
+		tscandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 &mintid, &maxtid);
+		node->ss.ss_currentScanDesc = tscandesc;
+	}
+
+	if (node->bs_tuplesortstate == NULL)
+	{
+		TupleDesc	tupDesc = (node->ss.ps.ps_ResultTupleDesc);
+
+		node->bs_tuplesortstate = tuplesort_begin_heap(tupDesc,
+													plan->numCols,
+													plan->sortColIdx,
+													plan->sortOperators,
+													plan->collations,
+													plan->nullsFirst,
+													work_mem,
+													NULL,
+													TUPLESORT_NONE);
+	}
+
+	if (node->bs_tuplestore == NULL)
+	{
+		node->bs_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+	}
+}
+
+/*
+ * brinsort_end_tidscan
+ *		Finish the TID range scan.
+ */
+static void
+brinsort_end_tidscan(BrinSortState *node)
+{
+	/* get the first range, read all tuples using a tid range scan */
+	if (node->ss.ss_currentScanDesc != NULL)
+	{
+		table_endscan(node->ss.ss_currentScanDesc);
+		node->ss.ss_currentScanDesc = NULL;
+	}
+}
+
+/*
+ * brinsort_update_watermark
+ *		Advance the watermark to the next minval (or maxval for DESC).
+ *
+ * We could could actually advance the watermark by multiple steps (not to
+ * the immediately following minval, but a couple more), to accumulate more
+ * rows in the tuplesort. The number of steps we make correlates with the
+ * amount of data we sort in a given step, but we don't know in advance
+ * how many rows (or bytes) will that actually be. We could do some simple
+ * heuristics (measure past sorts and extrapolate).
+ *
+ * XXX With a separate _set and _empty flags, we don't really need to pass
+ * a separate "first" parameter - "set=false" has the same meaning.
+ */
+static void
+brinsort_update_watermark(BrinSortState *node, bool asc)
+{
+	int		cmp;
+	bool	found = false;
+
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool	isnull;
+		Datum	value;
+		bool	all_nulls;
+		bool	not_summarized;
+
+		all_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 4, &isnull));
+		Assert(!isnull);
+
+		not_summarized = DatumGetBool(slot_getattr(node->bs_scan->slot, 5, &isnull));
+		Assert(!isnull);
+
+		/* we ignore ranges that are either all_nulls or not summarized */
+		if (all_nulls || not_summarized)
+			continue;
+
+		/* use either minval or maxval, depending on the ASC / DESC */
+		if (asc)
+			value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		else
+			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		if (!node->bs_watermark_set)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			continue;
+		}
+
+		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
+								  &node->bs_sortsupport);
+
+		if (cmp < 0)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			found = true;
+			break;
+		}
+	}
+
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	node->bs_watermark_empty = (!found);
+}
+
+/*
+ * brinsort_load_tuples
+ *		Load tuples from the TID range scan, add them to tuplesort/store.
+ *
+ * When called for the "current" range, we don't need to check the watermark,
+ * we know the tuple goes into the tuplesort. So with check_watermark we
+ * skip the comparator call to save CPU cost.
+ */
+static void
+brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_processing)
+{
+	BrinSort	   *plan = (BrinSort *) node->ss.ps.plan;
+	TableScanDesc	scan;
+	EState		   *estate;
+	ScanDirection	direction;
+	TupleTableSlot *slot;
+	BrinRange	   *range = node->bs_range;
+	ProjectionInfo *projInfo;
+
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	projInfo = node->bs_ProjInfo;
+
+	slot = node->ss.ss_ScanTupleSlot;
+
+	Assert(node->bs_range != NULL);
+
+	/*
+	 * If we're not processign NULLS, and this is all-nulls range, we can
+	 * just skip it - we won't find any non-NULL tuples in it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (!null_processing && range->all_nulls)
+		return;
+
+	/*
+	 * Similarly, if we're processing NULLs and this range does not have
+	 * has_nulls flag, we can skip it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
+		return;
+
+	brinsort_start_tidscan(node);
+
+	scan = node->ss.ss_currentScanDesc;
+
+	/*
+	 * Read tuples, evaluate the filter (so that we don't keep tuples only to
+	 * discard them later), and decide if it goes into the current range
+	 * (tuplesort) or overflow (tuplestore).
+	 */
+	while (table_scan_getnextslot_tidrange(scan, direction, slot))
+	{
+		ExprContext *econtext;
+		ExprState  *qual;
+
+		/*
+		 * Fetch data from node
+		 */
+		qual = node->bs_qual;
+		econtext = node->ss.ps.ps_ExprContext;
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 *
+		 * XXX Done here, because in ExecScan we'll get different slot type
+		 * (minimal tuple vs. buffered tuple). Scan expects slot while reading
+		 * from the table (like here), but we're stashing it into a tuplesort.
+		 *
+		 * XXX Maybe we could eliminate many tuples by leveraging the BRIN
+		 * range, by executing the consistent function. But we don't have
+		 * the qual in appropriate format at the moment, so we'd preprocess
+		 * the keys similarly to bringetbitmap(). In which case we should
+		 * probably evaluate the stuff while building the ranges? Although,
+		 * if the "consistent" function is expensive, it might be cheaper
+		 * to do that incrementally, as we need the ranges. Would be a win
+		 * for LIMIT queries, for example.
+		 *
+		 * XXX However, maybe we could also leverage other bitmap indexes,
+		 * particularly for BRIN indexes because that makes it simpler to
+		 * eliminate the ranges incrementally - we know which ranges to
+		 * load from the index, while for other indexes (e.g. btree) we
+		 * have to read the whole index and build a bitmap in order to have
+		 * a bitmap for any range. Although, if the condition is very
+		 * selective, we may need to read only a small fraction of the
+		 * index, so maybe that's OK.
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			int		cmp = 0;	/* matters for check_watermark=false */
+			Datum	value;
+			bool	isnull;
+			TupleTableSlot *tmpslot;
+
+			if (projInfo)
+				tmpslot = ExecProject(projInfo);
+			else
+				tmpslot = slot;
+
+			value = slot_getattr(tmpslot, plan->sortColIdx[0], &isnull);
+
+			/*
+			 * Handle NULL values - stash them into the tuplestore, and then
+			 * we'll output them in "process" stage.
+			 *
+			 * XXX Can we be a bit smarter for LIMIT queries and stop reading
+			 * rows once we get the number we need to produce? Probably not,
+			 * because the ordering may reference other columns (which we may
+			 * satisfy through IncrementalSort). But all NULL columns are
+			 * considered equal, so we need all the rows to properly compare
+			 * the other keys.
+			 */
+			if (null_processing)
+			{
+				/* Stash it to the tuplestore (when NULL, or ignore
+				 * it (when not-NULL). */
+				if (isnull)
+					tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+
+				/* NULL or not, we're done */
+				continue;
+			}
+
+			/* we're not processing NULL values, so ignore NULLs */
+			if (isnull)
+				continue;
+
+			/*
+			 * Otherwise compare to watermark, and stash it either to the
+			 * tuplesort or tuplestore.
+			 */
+			if (check_watermark && node->bs_watermark_set && !node->bs_watermark_empty)
+				cmp = ApplySortComparator(value, false,
+										  node->bs_watermark, false,
+										  &node->bs_sortsupport);
+
+			if (cmp <= 0)
+				tuplesort_puttupleslot(node->bs_tuplesortstate, tmpslot);
+			else
+			{
+				/*
+				 * XXX We can be a bit smarter for LIMIT queries - once we
+				 * know we have more rows in the tuplesort than we need to
+				 * output, we can stop spilling - those rows are not going
+				 * to be needed. We can discard the tuplesort (no need to
+				 * respill) and stop spilling.
+				 */
+				tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+			}
+		}
+
+		ExecClearTuple(slot);
+	}
+
+	ExecClearTuple(slot);
+
+	brinsort_end_tidscan(node);
+}
+
+/*
+ * brinsort_load_spill_tuples
+ *		Load tuples from the spill tuplestore, and either stash them into
+ *		a tuplesort or a new tuplestore.
+ *
+ * After processing the last range, we want to process all remaining ranges,
+ * so with check_watermark=false we skip the check.
+ */
+static void
+brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	Tuplestorestate *tupstore;
+	TupleTableSlot *slot;
+	ProjectionInfo *projInfo;
+
+	projInfo = node->bs_ProjInfo;
+
+	if (node->bs_tuplestore == NULL)
+		return;
+
+	/* start scanning the existing tuplestore (XXX needed?) */
+	tuplestore_rescan(node->bs_tuplestore);
+
+	/*
+	 * Create a new tuplestore, for tuples that exceed the watermark and so
+	 * should not be included in the current sort.
+	 */
+	tupstore = tuplestore_begin_heap(false, false, work_mem);
+
+	/*
+	 * We need a slot for minimal tuples. The scan slot uses buffered tuples,
+	 * so it'd trigger an error in the loop.
+	 */
+	if (projInfo)
+		slot = node->ss.ps.ps_ResultTupleSlot;
+	else
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(node->ss.ss_currentRelation),
+									&TTSOpsMinimalTuple);
+
+	while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+	{
+		int		cmp = 0;	/* matters for check_watermark=false */
+		bool	isnull;
+		Datum	value;
+
+		value = slot_getattr(slot, plan->sortColIdx[0], &isnull);
+
+		/* We shouldn't have NULL values in the spill, at least not now. */
+		Assert(!isnull);
+
+		if (check_watermark && node->bs_watermark_set && !node->bs_watermark_empty)
+			cmp = ApplySortComparator(value, false,
+									  node->bs_watermark, false,
+									  &node->bs_sortsupport);
+
+		if (cmp <= 0)
+			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+		else
+		{
+			/*
+			 * XXX We can be a bit smarter for LIMIT queries - once we
+			 * know we have more rows in the tuplesort than we need to
+			 * output, we can stop spilling - those rows are not going
+			 * to be needed. We can discard the tuplesort (no need to
+			 * respill) and stop spilling.
+			 */
+			tuplestore_puttupleslot(tupstore, slot);
+		}
+	}
+
+	/*
+	 * Discard the existing tuplestore (that we just processed), use the new
+	 * one instead.
+	 */
+	tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = tupstore;
+
+	if (!projInfo)
+		ExecDropSingleTupleTableSlot(slot);
+}
+
+static bool
+brinsort_next_range(BrinSortState *node, bool asc)
+{
+	/* FIXME free the current bs_range, if any */
+	node->bs_range = NULL;
+
+	/*
+	 * Mark the position, so that we can restore it in case we reach the
+	 * current watermark.
+	 */
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	/*
+	 * Get the next range and return it, unless we can prove it's the last
+	 * range that can possibly match the current conditon (thanks to how we
+	 * order the ranges).
+	 *
+	 * Also skip ranges that can't possibly match (e.g. because we are in
+	 * NULL processing, and the range has no NULLs).
+	 */
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool		isnull;
+		Datum		value;
+
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = slot_getattr(node->bs_scan->slot, 1, &isnull);
+		range->blkno_end = slot_getattr(node->bs_scan->slot, 2, &isnull);
+		range->has_nulls = slot_getattr(node->bs_scan->slot, 3, &isnull);
+		range->all_nulls = slot_getattr(node->bs_scan->slot, 4, &isnull);
+		range->not_summarized = slot_getattr(node->bs_scan->slot, 5, &isnull);
+		range->min_value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		/*
+		 * Not-summarized ranges match irrespectedly of the watermark (if
+		 * it's set at all).
+		 */
+		if (range->not_summarized)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * The range is summarized, but maybe the watermark is not? That
+		 * would mean we're processing NULL values, so we skip ranges that
+		 * can't possibly match (i.e. with all_nulls=has_nulls=false).
+		 */
+		if (!node->bs_watermark_set)
+		{
+			if (range->all_nulls || range->has_nulls)
+			{
+				node->bs_range = range;
+				return true;
+			}
+
+			/* update the position and try the next range */
+			tuplesort_markpos(node->bs_scan->ranges);
+			pfree(range);
+
+			continue;
+		}
+
+		/*
+		 * Watermark is set, but it's empty - everything matches (except
+		 * for NULL-only ranges, because we're definitely not processing
+		 * NULLS, because then we wouldn't have watermark set).
+		 */
+		if (node->bs_watermark_empty)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * So now we have a summarized range, and we know the watermark
+		 * is set too (so we're not processing NULLs). We place the ranges
+		 * with only nulls last, so once we hit one we're done.
+		 */
+		if (range->all_nulls)
+		{
+			pfree(range);
+			return false;	/* no more matching ranges */
+		}
+
+		/*
+		 * Compare the range to the watermark, using either the minval or
+		 * maxval, depending on ASC/DESC ordering. If the range precedes the
+		 * watermark, return it. Otherwise abort, all the future ranges are
+		 * either not matching the watermark (thanks to ordering) or contain
+		 * only NULL values.
+		 */
+
+		/* use minval or maxval, depending on ASC / DESC */
+		value = (asc) ? range->min_value : range->max_value;
+
+		/*
+		 * compare it to the current watermark (if set)
+		 *
+		 * XXX We don't use (... <= 0) here, because then we'd load ranges
+		 * with that minval (and there might be multiple), but most of the
+		 * rows would go into the tuplestore, because only rows matching the
+		 * minval exactly would be loaded into tuplesort.
+		 */
+		if (ApplySortComparator(value, false,
+								 node->bs_watermark, false,
+								 &node->bs_sortsupport) < 0)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		pfree(range);
+		break;
+	}
+
+	/* not a matching range, we're done */
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	return false;
+}
+
+static bool
+brinsort_range_with_nulls(BrinSortState *node)
+{
+	BrinRange *range = node->bs_range;
+
+	if (range->all_nulls || range->has_nulls || range->not_summarized)
+		return true;
+
+	return false;
+}
+
+static void
+brinsort_rescan(BrinSortState *node)
+{
+	tuplesort_rescan(node->bs_scan->ranges);
+}
+
+/* ----------------------------------------------------------------
+ *		IndexNext
+ *
+ *		Retrieve a tuple from the BrinSort node's currentRelation
+ *		using the index specified in the BrinSortState information.
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+IndexNext(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate;
+	ScanDirection direction;
+	IndexScanDesc scandesc;
+	TupleTableSlot *slot;
+	bool		nullsFirst;
+	bool		asc;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+
+	/* flip direction if this is an overall backward scan */
+	/* XXX For BRIN indexes this is always forward direction */
+	// if (ScanDirectionIsBackward(((BrinSort *) node->ss.ps.plan)->indexorderdir))
+	if (false)
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	scandesc = node->iss_ScanDesc;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	nullsFirst = plan->nullsFirst[0];
+	asc = ScanDirectionIsForward(plan->indexorderdir);
+
+	if (scandesc == NULL)
+	{
+		/*
+		 * We reach here if the index scan is not parallel, or if we're
+		 * serially executing an index scan that was planned to be parallel.
+		 */
+		scandesc = index_beginscan(node->ss.ss_currentRelation,
+								   node->iss_RelationDesc,
+								   estate->es_snapshot,
+								   node->iss_NumScanKeys,
+								   node->iss_NumOrderByKeys);
+
+		node->iss_ScanDesc = scandesc;
+
+		/*
+		 * If no run-time keys to calculate or they are ready, go ahead and
+		 * pass the scankeys to the index AM.
+		 */
+		if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+			index_rescan(scandesc,
+						 node->iss_ScanKeys, node->iss_NumScanKeys,
+						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+
+		/*
+		 * Load info about BRIN ranges, sort them to match the desired ordering.
+		 */
+		ExecInitBrinSortRanges(plan, node);
+		node->bs_phase = BRINSORT_START;
+	}
+
+	/*
+	 * ok, now that we have what we need, fetch the next tuple.
+	 */
+	while (node->bs_phase != BRINSORT_FINISHED)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		elog(DEBUG1, "phase = %d", node->bs_phase);
+
+		AssertCheckRanges(node);
+
+		switch (node->bs_phase)
+		{
+			case BRINSORT_START:
+
+				elog(DEBUG1, "phase = START");
+
+				/*
+				 * If we have NULLS FIRST, move to that stage. Otherwise
+				 * start scanning regular ranges.
+				 */
+				if (nullsFirst)
+					node->bs_phase = BRINSORT_LOAD_NULLS;
+				else
+				{
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+
+					/* set the first watermark */
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_RANGE:
+				{
+					elog(DEBUG1, "phase = LOAD_RANGE");
+
+					/*
+					 * Load tuples matching the new watermark from the existing
+					 * spill tuplestore. We do this before loading tuples from
+					 * the next chunk of ranges, because those will add tuples
+					 * to the spill, and we'd end up processing those twice.
+					 */
+					brinsort_load_spill_tuples(node, true);
+
+					/*
+					 * Load tuples from ranges, until we find a range that has
+					 * min_value >= watermark.
+					 *
+					 * XXX In fact, we are guaranteed to find an exact match
+					 * for the watermark, because of how we pick the watermark.
+					 */
+					while (brinsort_next_range(node, asc))
+						brinsort_load_tuples(node, true, false);
+
+					/*
+					 * If we have loaded any tuples into the tuplesort, try
+					 * sorting it and move to producing the tuples.
+					 *
+					 * XXX The range might have no rows matching the current
+					 * watermark, in which case the tuplesort is empty.
+					 */
+					if (node->bs_tuplesortstate)
+					{
+#ifdef DEBUG_BRIN_SORT
+						tuplesort_reset_stats(node->bs_tuplesortstate);
+#endif
+
+						tuplesort_performsort(node->bs_tuplesortstate);
+
+#ifdef DEBUG_BRIN_SORT
+						if (debug_brin_sort)
+						{
+							TuplesortInstrumentation stats;
+
+							memset(&stats, 0, sizeof(TuplesortInstrumentation));
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							elog(WARNING, "method: %s  space: %ld kB (%s)",
+								 tuplesort_method_name(stats.sortMethod),
+								 stats.spaceUsed,
+								 tuplesort_space_type_name(stats.spaceType));
+						}
+#endif
+					}
+
+					node->bs_phase = BRINSORT_PROCESS_RANGE;
+					break;
+				}
+
+			case BRINSORT_PROCESS_RANGE:
+
+				elog(DEBUG1, "phase BRINSORT_PROCESS_RANGE");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplesortstate != NULL)
+				{
+					if (tuplesort_gettupleslot(node->bs_tuplesortstate,
+										ScanDirectionIsForward(direction),
+										false, slot, NULL))
+						return slot;
+
+					/* once we're done with the tuplesort, reset it */
+					tuplesort_reset(node->bs_tuplesortstate);
+				}
+
+				/*
+				 * Now that we processed tuples from the last range batch,
+				 * see if we reached the end of if we should try updating
+				 * the watermark once again. If the watermark is not set,
+				 * we've already processed the last range.
+				 */
+				if (node->bs_watermark_empty)
+				{
+					if (nullsFirst)
+						node->bs_phase = BRINSORT_FINISHED;
+					else
+					{
+						brinsort_rescan(node);
+						node->bs_phase = BRINSORT_LOAD_NULLS;
+						node->bs_watermark_set = false;
+						node->bs_watermark_empty = false;
+					}
+				}
+				else
+				{
+					/* updte the watermark and try reading more ranges */
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_NULLS:
+				{
+					elog(DEBUG1, "phase = LOAD_NULLS");
+
+					/*
+					 * Try loading another range. If there are no more ranges,
+					 * we're done and we move either to loading regular ranges.
+					 * Otherwise check if this range can contain NULL values.
+					 * If yes, process the range. If not, try loading another
+					 * one from the list.
+					 */
+					while (true)
+					{
+						/* no more ranges - terminate or load regular ranges */
+						if (!brinsort_next_range(node, asc))
+						{
+							if (nullsFirst)
+							{
+								brinsort_rescan(node);
+								node->bs_phase = BRINSORT_LOAD_RANGE;
+								brinsort_update_watermark(node, asc);
+							}
+							else
+								node->bs_phase = BRINSORT_FINISHED;
+
+							break;
+						}
+
+						/* If this range (may) have nulls, proces them */
+						if (brinsort_range_with_nulls(node))
+							break;
+					}
+
+					if (node->bs_range == NULL)
+						break;
+
+					/*
+					 * There should be nothing left in the tuplestore, because
+					 * we flush that at the end of processing regular tuples,
+					 * and we don't retain tuples between NULL ranges.
+					 */
+					// Assert(node->bs_tuplestore == NULL);
+
+					/*
+					 * Load the next unprocessed / NULL range. We don't need to
+					 * check watermark while processing NULLS.
+					 */
+					brinsort_load_tuples(node, false, true);
+
+					node->bs_phase = BRINSORT_PROCESS_NULLS;
+					break;
+				}
+
+				break;
+
+			case BRINSORT_PROCESS_NULLS:
+
+				elog(DEBUG1, "phase = LOAD_NULLS");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				Assert(node->bs_tuplestore != NULL);
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplestore != NULL)
+				{
+
+					while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+						return slot;
+
+					tuplestore_end(node->bs_tuplestore);
+					node->bs_tuplestore = NULL;
+
+					node->bs_phase = BRINSORT_LOAD_NULLS;	/* load next range */
+				}
+
+				break;
+
+			case BRINSORT_FINISHED:
+				elog(ERROR, "unexpected BrinSort phase: FINISHED");
+				break;
+		}
+	}
+
+	/*
+	 * if we get here it means the index scan failed so we are at the end of
+	 * the scan..
+	 */
+	node->iss_ReachedEnd = true;
+	return ExecClearTuple(slot);
+}
+
+/*
+ * IndexRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+IndexRecheck(BrinSortState *node, TupleTableSlot *slot)
+{
+	ExprContext *econtext;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	econtext = node->ss.ps.ps_ExprContext;
+
+	/* Does the tuple meet the indexqual condition? */
+	econtext->ecxt_scantuple = slot;
+	return ExecQualAndReset(node->indexqualorig, econtext);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSort(node)
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecBrinSort(PlanState *pstate)
+{
+	BrinSortState *node = castNode(BrinSortState, pstate);
+
+	/*
+	 * If we have runtime keys and they've not already been set up, do it now.
+	 */
+	if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady)
+		ExecReScan((PlanState *) node);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) IndexNext,
+					(ExecScanRecheckMtd) IndexRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanBrinSort(node)
+ *
+ *		Recalculates the values of any scan keys whose value depends on
+ *		information known at runtime, then rescans the indexed relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanBrinSort(BrinSortState *node)
+{
+	/*
+	 * If we are doing runtime key calculations (ie, any of the index key
+	 * values weren't simple Consts), compute the new key values.  But first,
+	 * reset the context so we don't leak memory as each outer tuple is
+	 * scanned.  Note this assumes that we will recalculate *all* runtime keys
+	 * on each call.
+	 */
+	if (node->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *econtext = node->iss_RuntimeContext;
+
+		ResetExprContext(econtext);
+		ExecIndexEvalRuntimeKeys(econtext,
+								 node->iss_RuntimeKeys,
+								 node->iss_NumRuntimeKeys);
+	}
+	node->iss_RuntimeKeysReady = true;
+
+	/* reset index scan */
+	if (node->iss_ScanDesc)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+	node->iss_ReachedEnd = false;
+
+	ExecScanReScan(&node->ss);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecEndBrinSort
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndBrinSort(BrinSortState *node)
+{
+	Relation	indexRelationDesc;
+	IndexScanDesc IndexScanDesc;
+
+	/*
+	 * extract information from the node
+	 */
+	indexRelationDesc = node->iss_RelationDesc;
+	IndexScanDesc = node->iss_ScanDesc;
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/*
+	 * close the index relation (no-op if we didn't open it)
+	 */
+	if (IndexScanDesc)
+		index_endscan(IndexScanDesc);
+	if (indexRelationDesc)
+		index_close(indexRelationDesc, NoLock);
+
+	if (node->ss.ss_currentScanDesc != NULL)
+		table_endscan(node->ss.ss_currentScanDesc);
+
+	if (node->bs_tuplestore != NULL)
+		tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = NULL;
+
+	if (node->bs_tuplesortstate != NULL)
+		tuplesort_end(node->bs_tuplesortstate);
+	node->bs_tuplesortstate = NULL;
+
+	if (node->bs_scan->ranges != NULL)
+		tuplesort_end(node->bs_scan->ranges);
+	node->bs_scan->ranges = NULL;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortMarkPos
+ *
+ * Note: we assume that no caller attempts to set a mark before having read
+ * at least one tuple.  Otherwise, iss_ScanDesc might still be NULL.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortMarkPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  If a test tuple exists for
+		 * this relation, then we shouldn't access the index at all.  We would
+		 * instead need to save, and later restore, the state of the
+		 * relsubs_done flag, so that re-fetching the test tuple is possible.
+		 * However, given the assumption that no caller sets a mark at the
+		 * start of the scan, we can only get here with relsubs_done[i]
+		 * already set, and so no state need be saved.
+		 */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortMarkPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_markpos(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecIndexRestrPos
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortRestrPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (estate->es_epq_active != NULL)
+	{
+		/* See comments in ExecIndexMarkPos */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortRestrPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_restrpos(node->iss_ScanDesc);
+}
+
+/*
+ * somewhat crippled verson of bringetbitmap
+ *
+ * XXX We don't call consistent function (or any other function), so unlike
+ * bringetbitmap we don't set a separate memory context. If we end up filtering
+ * the ranges somehow (e.g. by WHERE conditions), this might be necessary.
+ *
+ * XXX Should be part of opclass, to somewhere in brin_minmax.c etc.
+ */
+static void
+ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
+{
+	IndexScanDesc	scan = planstate->iss_ScanDesc;
+	Relation	indexRel = planstate->iss_RelationDesc;
+	int			attno;
+	FmgrInfo   *rangeproc;
+	BrinRangeScanDesc *brscan;
+	bool		asc;
+	TargetEntry *tle;
+	int			j;
+	List	   *indexprs = RelationGetIndexExpressions(indexRel);
+
+	/* BRIN Sort only allows ORDER BY using a single column */
+	Assert(node->numCols == 1);
+
+	/*
+	 * Determine index attnum we're interested in. sortColIdx is an index into
+	 * the target list, so we need to grab the expression and try to match it
+	 * to the index. The expression may be either plain Var (in which case we
+	 * match it to indkeys value), or an expression (in which case we match it
+	 * to indexprs).
+	 *
+	 * XXX We've already matched the sort key to the index, otherwise we would
+	 * not get here. So maybe we could just remember it, somehow? Also, we must
+	 * keep the decisions made in these two places consistent - if we fail to
+	 * match a sort key here (which we matched before), we have a problem.
+	 */
+	tle = list_nth(node->scan.plan.targetlist, node->sortColIdx[0] - 1);
+
+	/* find the index key matching the expression from the target entry */
+	j = 0;
+	attno = 0;
+	for (int i = 0; i < indexRel->rd_index->indnatts; i++)
+	{
+		AttrNumber indkey = indexRel->rd_index->indkey.values[i];
+
+		if (AttributeNumberIsValid(indkey))
+		{
+			Var *var = (Var *) tle->expr;
+
+			if (!IsA(tle->expr, Var))
+				continue;
+
+			if (var->varattno == indkey)
+			{
+				attno = (i + 1);
+				break;
+			}
+		}
+		else
+		{
+			Node *expr = (Node *) list_nth(indexprs, j);
+
+			if (equal(expr, tle->expr))
+			{
+				attno = (i + 1);
+				break;
+			}
+
+			j++;
+		}
+	}
+
+	/*
+	 * Make sure we matched the sort key - if not, we should not have got
+	 * to this place at all (try sorting using this index).
+	 */
+	Assert(attno > 0);
+
+	/*
+	 * get procedure to generate sort ranges
+	 *
+	 * FIXME we can't rely on a particular procnum to identify which opclass
+	 * allows building sort ranges, because the optinal procnums are not
+	 * unique (e.g. inclusion_ops have 12 too). So we probably need a flag
+	 * for the opclass.
+	 */
+	rangeproc = index_getprocinfo(indexRel, attno, BRIN_PROCNUM_RANGES);
+
+	/*
+	 * Should not get here without a proc, thanks to the check before
+	 * building the BrinSort path.
+	 */
+	Assert(OidIsValid(rangeproc->fn_oid));
+
+	memset(&planstate->bs_sortsupport, 0, sizeof(SortSupportData));
+
+	planstate->bs_sortsupport.ssup_collation = node->collations[0];
+	planstate->bs_sortsupport.ssup_cxt = CurrentMemoryContext; // FIXME
+
+	PrepareSortSupportFromOrderingOp(node->sortOperators[0], &planstate->bs_sortsupport);
+
+	/*
+	 * Determine if this ASC or DESC sort, so that we can request the
+	 * ranges in the appropriate order (ordered either by minval for
+	 * ASC, or by maxval for DESC).
+	 */
+	asc = ScanDirectionIsForward(node->indexorderdir);
+
+	/*
+	 * Ask the opclass to produce ranges in appropriate ordering.
+	 *
+	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
+	 */
+	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
+											node->collations[0],
+											PointerGetDatum(scan),
+											Int16GetDatum(attno),
+											BoolGetDatum(asc)));
+
+	/* allocate for space, and also for the alternative ordering */
+	planstate->bs_scan = brscan;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitBrinSort
+ *
+ *		Initializes the index scan's state information, creates
+ *		scan keys, and opens the base and index relations.
+ *
+ *		Note: index scans have 2 sets of state information because
+ *			  we have to keep track of the base relation and the
+ *			  index relation.
+ * ----------------------------------------------------------------
+ */
+BrinSortState *
+ExecInitBrinSort(BrinSort *node, EState *estate, int eflags)
+{
+	BrinSortState *indexstate;
+	Relation	currentRelation;
+	LOCKMODE	lockmode;
+
+	/*
+	 * create state structure
+	 */
+	indexstate = makeNode(BrinSortState);
+	indexstate->ss.ps.plan = (Plan *) node;
+	indexstate->ss.ps.state = estate;
+	indexstate->ss.ps.ExecProcNode = ExecBrinSort;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &indexstate->ss.ps);
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	indexstate->ss.ss_currentRelation = currentRelation;
+	indexstate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &indexstate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+	// ExecInitResultTypeTL(&indexstate->ss.ps);
+	// ExecAssignScanProjectionInfo(&indexstate->ss);
+	// ExecInitResultSlot(&indexstate->ss.ps, &TTSOpsVirtual);
+
+	indexstate->bs_ProjInfo = ExecBuildProjectionInfo(((Plan *) node)->targetlist,
+													  indexstate->ss.ps.ps_ExprContext,
+													  indexstate->ss.ps.ps_ResultTupleSlot,
+													  &indexstate->ss.ps,
+													  indexstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor);
+
+	/*
+	 * initialize child expressions
+	 *
+	 * Note: we don't initialize all of the indexqual expression, only the
+	 * sub-parts corresponding to runtime keys (see below).  Likewise for
+	 * indexorderby, if any.  But the indexqualorig expression is always
+	 * initialized even though it will only be used in some uncommon cases ---
+	 * would be nice to improve that.  (Problem is that any SubPlans present
+	 * in the expression must be found now...)
+	 */
+	indexstate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) indexstate);
+	indexstate->indexqualorig =
+		ExecInitQual(node->indexqualorig, (PlanState *) indexstate);
+
+	/*
+	 * If we are just doing EXPLAIN (ie, aren't going to run the plan), stop
+	 * here.  This allows an index-advisor plugin to EXPLAIN a plan containing
+	 * references to nonexistent indexes.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return indexstate;
+
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+
+	/*
+	 * Initialize index-specific scan state
+	 */
+	indexstate->iss_RuntimeKeysReady = false;
+	indexstate->iss_RuntimeKeys = NULL;
+	indexstate->iss_NumRuntimeKeys = 0;
+
+	/*
+	 * build the index scan keys from the index qualification
+	 */
+	ExecIndexBuildScanKeys((PlanState *) indexstate,
+						   indexstate->iss_RelationDesc,
+						   node->indexqual,
+						   false,
+						   &indexstate->iss_ScanKeys,
+						   &indexstate->iss_NumScanKeys,
+						   &indexstate->iss_RuntimeKeys,
+						   &indexstate->iss_NumRuntimeKeys,
+						   NULL,	/* no ArrayKeys */
+						   NULL);
+
+	/*
+	 * If we have runtime keys, we need an ExprContext to evaluate them. The
+	 * node's standard context won't do because we want to reset that context
+	 * for every tuple.  So, build another context just like the other one...
+	 * -tgl 7/11/00
+	 */
+	if (indexstate->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *stdecontext = indexstate->ss.ps.ps_ExprContext;
+
+		ExecAssignExprContext(estate, &indexstate->ss.ps);
+		indexstate->iss_RuntimeContext = indexstate->ss.ps.ps_ExprContext;
+		indexstate->ss.ps.ps_ExprContext = stdecontext;
+	}
+	else
+	{
+		indexstate->iss_RuntimeContext = NULL;
+	}
+
+	indexstate->bs_tuplesortstate = NULL;
+	indexstate->bs_qual = indexstate->ss.ps.qual;
+	indexstate->ss.ps.qual = NULL;
+	// ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+
+	/*
+	 * all done.
+	 */
+	return indexstate;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortEstimate(BrinSortState *node,
+					  ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+
+	node->iss_PscanLen = index_parallelscan_estimate(node->iss_RelationDesc,
+													 estate->es_snapshot);
+	shm_toc_estimate_chunk(&pcxt->estimator, node->iss_PscanLen);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeDSM
+ *
+ *		Set up a parallel index scan descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeDSM(BrinSortState *node,
+						   ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_allocate(pcxt->toc, node->iss_PscanLen);
+	index_parallelscan_initialize(node->ss.ss_currentRelation,
+								  node->iss_RelationDesc,
+								  estate->es_snapshot,
+								  piscan);
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, piscan);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortReInitializeDSM(BrinSortState *node,
+							 ParallelContext *pcxt)
+{
+	index_parallelrescan(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeWorker(BrinSortState *node,
+							  ParallelWorkerContext *pwcxt)
+{
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7918bb6f0d..86f91e6577 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -791,6 +791,260 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	path->path.total_cost = startup_cost + run_cost;
 }
 
+void
+cost_brinsort(BrinSortPath *path, PlannerInfo *root, double loop_count,
+		   bool partial_path)
+{
+	IndexOptInfo *index = path->ipath.indexinfo;
+	RelOptInfo *baserel = index->rel;
+	amcostestimate_function amcostestimate;
+	List	   *qpquals;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_run_cost = 0;
+	Cost		indexStartupCost;
+	Cost		indexTotalCost;
+	Selectivity indexSelectivity;
+	double		indexCorrelation,
+				csquared;
+	double		spc_seq_page_cost,
+				spc_random_page_cost;
+	Cost		min_IO_cost,
+				max_IO_cost;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	double		tuples_fetched;
+	double		pages_fetched;
+	double		rand_heap_pages;
+	double		index_pages;
+
+	/* Should only be applied to base relations */
+	Assert(IsA(baserel, RelOptInfo) &&
+		   IsA(index, IndexOptInfo));
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/*
+	 * Mark the path with the correct row estimate, and identify which quals
+	 * will need to be enforced as qpquals.  We need not check any quals that
+	 * are implied by the index's predicate, so we can use indrestrictinfo not
+	 * baserestrictinfo as the list of relevant restriction clauses for the
+	 * rel.
+	 */
+	if (path->ipath.path.param_info)
+	{
+		path->ipath.path.rows = path->ipath.path.param_info->ppi_rows;
+		/* qpquals come from the rel's restriction clauses and ppi_clauses */
+		qpquals = list_concat(extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+														  path->ipath.indexclauses),
+							  extract_nonindex_conditions(path->ipath.path.param_info->ppi_clauses,
+														  path->ipath.indexclauses));
+	}
+	else
+	{
+		path->ipath.path.rows = baserel->rows;
+		/* qpquals come from just the rel's restriction clauses */
+		qpquals = extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+											  path->ipath.indexclauses);
+	}
+
+	if (!enable_indexscan)
+		startup_cost += disable_cost;
+	/* we don't need to check enable_indexonlyscan; indxpath.c does that */
+
+	/*
+	 * Call index-access-method-specific code to estimate the processing cost
+	 * for scanning the index, as well as the selectivity of the index (ie,
+	 * the fraction of main-table tuples we will have to retrieve) and its
+	 * correlation to the main-table tuple order.  We need a cast here because
+	 * pathnodes.h uses a weak function type to avoid including amapi.h.
+	 */
+	amcostestimate = (amcostestimate_function) index->amcostestimate;
+	amcostestimate(root, &path->ipath, loop_count,
+				   &indexStartupCost, &indexTotalCost,
+				   &indexSelectivity, &indexCorrelation,
+				   &index_pages);
+
+	/*
+	 * Save amcostestimate's results for possible use in bitmap scan planning.
+	 * We don't bother to save indexStartupCost or indexCorrelation, because a
+	 * bitmap scan doesn't care about either.
+	 */
+	path->ipath.indextotalcost = indexTotalCost;
+	path->ipath.indexselectivity = indexSelectivity;
+
+	/* all costs for touching index itself included here */
+	startup_cost += indexStartupCost;
+	run_cost += indexTotalCost - indexStartupCost;
+
+	/* estimate number of main-table tuples fetched */
+	tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);
+
+	/* fetch estimated page costs for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/*----------
+	 * Estimate number of main-table pages fetched, and compute I/O cost.
+	 *
+	 * When the index ordering is uncorrelated with the table ordering,
+	 * we use an approximation proposed by Mackert and Lohman (see
+	 * index_pages_fetched() for details) to compute the number of pages
+	 * fetched, and then charge spc_random_page_cost per page fetched.
+	 *
+	 * When the index ordering is exactly correlated with the table ordering
+	 * (just after a CLUSTER, for example), the number of pages fetched should
+	 * be exactly selectivity * table_size.  What's more, all but the first
+	 * will be sequential fetches, not the random fetches that occur in the
+	 * uncorrelated case.  So if the number of pages is more than 1, we
+	 * ought to charge
+	 *		spc_random_page_cost + (pages_fetched - 1) * spc_seq_page_cost
+	 * For partially-correlated indexes, we ought to charge somewhere between
+	 * these two estimates.  We currently interpolate linearly between the
+	 * estimates based on the correlation squared (XXX is that appropriate?).
+	 *
+	 * If it's an index-only scan, then we will not need to fetch any heap
+	 * pages for which the visibility map shows all tuples are visible.
+	 * Hence, reduce the estimated number of heap fetches accordingly.
+	 * We use the measured fraction of the entire heap that is all-visible,
+	 * which might not be particularly relevant to the subset of the heap
+	 * that this query will fetch; but it's not clear how to do better.
+	 *----------
+	 */
+	if (loop_count > 1)
+	{
+		/*
+		 * For repeated indexscans, the appropriate estimate for the
+		 * uncorrelated case is to scale up the number of tuples fetched in
+		 * the Mackert and Lohman formula by the number of scans, so that we
+		 * estimate the number of pages fetched by all the scans; then
+		 * pro-rate the costs for one scan.  In this case we assume all the
+		 * fetches are random accesses.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+
+		/*
+		 * In the perfectly correlated case, the number of pages touched by
+		 * each scan is selectivity * table_size, and we can use the Mackert
+		 * and Lohman formula at the page level to estimate how much work is
+		 * saved by caching across scans.  We still assume all the fetches are
+		 * random, though, which is an overestimate that's hard to correct for
+		 * without double-counting the cache effects.  (But in most cases
+		 * where such a plan is actually interesting, only one page would get
+		 * fetched per scan anyway, so it shouldn't matter much.)
+		 */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		pages_fetched = index_pages_fetched(pages_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		min_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+	}
+	else
+	{
+		/*
+		 * Normal case: apply the Mackert and Lohman formula, and then
+		 * interpolate between that and the correlation-derived result.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		/* max_IO_cost is for the perfectly uncorrelated case (csquared=0) */
+		max_IO_cost = pages_fetched * spc_random_page_cost;
+
+		/* min_IO_cost is for the perfectly correlated case (csquared=1) */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		if (pages_fetched > 0)
+		{
+			min_IO_cost = spc_random_page_cost;
+			if (pages_fetched > 1)
+				min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
+		}
+		else
+			min_IO_cost = 0;
+	}
+
+	if (partial_path)
+	{
+		/*
+		 * Estimate the number of parallel workers required to scan index. Use
+		 * the number of heap pages computed considering heap fetches won't be
+		 * sequential as for parallel scans the pages are accessed in random
+		 * order.
+		 */
+		path->ipath.path.parallel_workers = compute_parallel_worker(baserel,
+															  rand_heap_pages,
+															  index_pages,
+															  max_parallel_workers_per_gather);
+
+		/*
+		 * Fall out if workers can't be assigned for parallel scan, because in
+		 * such a case this path will be rejected.  So there is no benefit in
+		 * doing extra computation.
+		 */
+		if (path->ipath.path.parallel_workers <= 0)
+			return;
+
+		path->ipath.path.parallel_aware = true;
+	}
+
+	/*
+	 * Now interpolate based on estimated index order correlation to get total
+	 * disk I/O cost for main table accesses.
+	 */
+	csquared = indexCorrelation * indexCorrelation;
+
+	run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
+
+	/*
+	 * Estimate CPU costs per tuple.
+	 *
+	 * What we want here is cpu_tuple_cost plus the evaluation costs of any
+	 * qual clauses that we have to evaluate as qpquals.
+	 */
+	cost_qual_eval(&qpqual_cost, qpquals, root);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+
+	cpu_run_cost += cpu_per_tuple * tuples_fetched;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->ipath.path.pathtarget->cost.startup;
+	cpu_run_cost += path->ipath.path.pathtarget->cost.per_tuple * path->ipath.path.rows;
+
+	/* Adjust costing for parallelism, if used. */
+	if (path->ipath.path.parallel_workers > 0)
+	{
+		double		parallel_divisor = get_parallel_divisor(&path->ipath.path);
+
+		path->ipath.path.rows = clamp_row_est(path->ipath.path.rows / parallel_divisor);
+
+		/* The CPU cost is divided among all the workers. */
+		cpu_run_cost /= parallel_divisor;
+	}
+
+	run_cost += cpu_run_cost;
+
+	path->ipath.path.startup_cost = startup_cost;
+	path->ipath.path.total_cost = startup_cost + run_cost;
+}
+
 /*
  * extract_nonindex_conditions
  *
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 721a075201..132718fb73 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -17,12 +17,16 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
+#include "access/relation.h"
 #include "access/stratnum.h"
 #include "access/sysattr.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
@@ -32,10 +36,13 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/rel.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
 
+bool		enable_brinsort = true;
+
 /* XXX see PartCollMatchesExprColl */
 #define IndexCollMatchesExprColl(idxcollation, exprcollation) \
 	((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
@@ -1103,6 +1110,182 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	/*
+	 * If this is a BRIN index with suitable opclass (minmax or such), we may
+	 * try doing BRIN sort. BRIN indexes are not ordered and amcanorderbyop
+	 * is set to false, so we probably will need some new opclass flag to
+	 * mark indexes that support this.
+	 */
+	if (enable_brinsort && pathkeys_possibly_useful)
+	{
+		ListCell *lc;
+		Relation rel2 = relation_open(index->indexoid, NoLock);
+		int		 idx;
+
+		/*
+		 * Try generating sorted paths for each key with the right opclass.
+		 */
+		idx = -1;
+		foreach(lc, index->indextlist)
+		{
+			TargetEntry	   *indextle = (TargetEntry *) lfirst(lc);
+			BrinSortPath   *bpath;
+			Oid				rangeproc;
+			AttrNumber		attnum;
+
+			idx++;
+			attnum = (idx + 1);
+
+
+			/* XXX ignore non-BRIN indexes */
+			if (rel2->rd_rel->relam != BRIN_AM_OID)
+				continue;
+
+			/*
+			 * XXX Ignore keys not using an opclass with the "ranges" proc.
+			 * For now we only do this for some minmax opclasses, but adding
+			 * it to all minmax is simple, and adding it to minmax-multi
+			 * should not be very hard.
+			 */
+			rangeproc = index_getprocid(rel2, attnum, BRIN_PROCNUM_RANGES);
+			if (!OidIsValid(rangeproc))
+				continue;
+
+			/*
+			 * XXX stuff extracted from build_index_pathkeys, except that we
+			 * only deal with a single index key (producing a single pathkey),
+			 * so we only sort on a single column. I guess we could use more
+			 * index keys and sort on more expressions? Would that mean these
+			 * keys need to be rather well correlated? In any case, it seems
+			 * rather complex to implement, so I leave it as a possible
+			 * future improvement.
+			 *
+			 * XXX This could also use the other BRIN keys (even from other
+			 * indexes) in a different way - we might use the other ranges
+			 * to quickly eliminate some of the chunks, essentially like a
+			 * bitmap, but maybe without using the bitmap. Or we might use
+			 * other indexes through bitmaps.
+			 *
+			 * XXX This fakes a number of parameters, because we don't store
+			 * the btree opclass in the index, instead we use the default
+			 * one for the key data type. And BRIN does not allow specifying
+			 *
+			 * XXX We don't add the path to result, because this function is
+			 * supposed to generate IndexPaths. Instead, we just add the path
+			 * using add_path(). We should be building this in a different
+			 * place, perhaps in create_index_paths() or so.
+			 *
+			 * XXX By building it elsewhere, we could also leverage the index
+			 * paths we've built here, particularly the bitmap index paths,
+			 * which we could use to eliminate many of the ranges.
+			 *
+			 * XXX We don't have any explicit ordering associated with the
+			 * BRIN index, e.g. we don't have ASC/DESC and NULLS FIRST/LAST.
+			 * So this is not encoded in the index, and we can satisfy all
+			 * these cases - but we need to add paths for each combination.
+			 * I wonder if there's a better way to do this.
+			 */
+
+			/* ASC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* ASC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+		}
+
+		relation_close(rel2, NoLock);
+	}
+
 	return result;
 }
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index c4e7f97f68..10ea23a501 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -27,6 +27,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/lsyscache.h"
+#include "utils/typcache.h"
 
 
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
@@ -622,6 +623,54 @@ build_index_pathkeys(PlannerInfo *root,
 	return retval;
 }
 
+
+List *
+build_index_pathkeys_brin(PlannerInfo *root,
+						  IndexOptInfo *index,
+						  TargetEntry  *tle,
+						  int idx,
+						  bool reverse_sort,
+						  bool nulls_first)
+{
+	TypeCacheEntry *typcache;
+	PathKey		   *cpathkey;
+	Oid				sortopfamily;
+
+	/*
+	 * Get default btree opfamily for the type, extracted from the
+	 * entry in index targetlist.
+	 *
+	 * XXX Is there a better / more correct way to do this?
+	 */
+	typcache = lookup_type_cache(exprType((Node *) tle->expr),
+								 TYPECACHE_BTREE_OPFAMILY);
+	sortopfamily = typcache->btree_opf;
+
+	/*
+	 * OK, try to make a canonical pathkey for this sort key.  Note we're
+	 * underneath any outer joins, so nullable_relids should be NULL.
+	 */
+	cpathkey = make_pathkey_from_sortinfo(root,
+										  tle->expr,
+										  sortopfamily,
+										  index->opcintype[idx],
+										  index->indexcollations[idx],
+										  reverse_sort,
+										  nulls_first,
+										  0,
+										  index->rel->relids,
+										  false);
+
+	/*
+	 * There may be no pathkey if we haven't matched any sortkey, in which
+	 * case ignore it.
+	 */
+	if (!cpathkey)
+		return NIL;
+
+	return list_make1(cpathkey);
+}
+
 /*
  * partkey_is_bool_constant_for_query
  *
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 134130476e..78837928d9 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -124,6 +124,8 @@ static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
 										  List *tlist, List *scan_clauses);
 static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
 								   List *tlist, List *scan_clauses, bool indexonly);
+static BrinSort *create_brinsort_plan(PlannerInfo *root, BrinSortPath *best_path,
+									  List *tlist, List *scan_clauses);
 static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
 											   BitmapHeapPath *best_path,
 											   List *tlist, List *scan_clauses);
@@ -191,6 +193,9 @@ static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 List *indexorderby,
 										 List *indextlist,
 										 ScanDirection indexscandir);
+static BrinSort *make_brinsort(List *qptlist, List *qpqual, Index scanrelid,
+							   Oid indexid, List *indexqual, List *indexqualorig,
+							   ScanDirection indexscandir);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -410,6 +415,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_CustomScan:
 			plan = create_scan_plan(root, best_path, flags);
 			break;
+		case T_BrinSort:
+			plan = create_scan_plan(root, best_path, flags);
+			break;
 		case T_HashJoin:
 		case T_MergeJoin:
 		case T_NestLoop:
@@ -776,6 +784,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												   scan_clauses);
 			break;
 
+		case T_BrinSort:
+			plan = (Plan *) create_brinsort_plan(root,
+												 (BrinSortPath *) best_path,
+												 tlist,
+												 scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -3185,6 +3200,154 @@ create_indexscan_plan(PlannerInfo *root,
 	return scan_plan;
 }
 
+/*
+ * create_brinsort_plan
+ *	  Returns a brinsort plan for the base relation scanned by 'best_path'
+ *	  with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ *
+ * This is mostly a slighly simplified version of create_indexscan_plan, with
+ * the unecessary parts removed (we don't support indexonly scans, or reordering
+ * and similar stuff).
+ */
+static BrinSort *
+create_brinsort_plan(PlannerInfo *root,
+					 BrinSortPath *best_path,
+					 List *tlist,
+					 List *scan_clauses)
+{
+	BrinSort   *brinsort_plan;
+	List	   *indexclauses = best_path->ipath.indexclauses;
+	Index		baserelid = best_path->ipath.path.parent->relid;
+	IndexOptInfo *indexinfo = best_path->ipath.indexinfo;
+	Oid			indexoid = indexinfo->indexoid;
+	List	   *qpqual;
+	List	   *stripped_indexquals;
+	List	   *fixed_indexquals;
+	ListCell   *l;
+
+	List	   *pathkeys = best_path->ipath.path.pathkeys;
+
+	/* it should be a base rel... */
+	Assert(baserelid > 0);
+	Assert(best_path->ipath.path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * Extract the index qual expressions (stripped of RestrictInfos) from the
+	 * IndexClauses list, and prepare a copy with index Vars substituted for
+	 * table Vars.  (This step also does replace_nestloop_params on the
+	 * fixed_indexquals.)
+	 */
+	fix_indexqual_references(root, &best_path->ipath,
+							 &stripped_indexquals,
+							 &fixed_indexquals);
+
+	/*
+	 * The qpqual list must contain all restrictions not automatically handled
+	 * by the index, other than pseudoconstant clauses which will be handled
+	 * by a separate gating plan node.  All the predicates in the indexquals
+	 * will be checked (either by the index itself, or by nodeIndexscan.c),
+	 * but if there are any "special" operators involved then they must be
+	 * included in qpqual.  The upshot is that qpqual must contain
+	 * scan_clauses minus whatever appears in indexquals.
+	 *
+	 * is_redundant_with_indexclauses() detects cases where a scan clause is
+	 * present in the indexclauses list or is generated from the same
+	 * EquivalenceClass as some indexclause, and is therefore redundant with
+	 * it, though not equal.  (The latter happens when indxpath.c prefers a
+	 * different derived equality than what generate_join_implied_equalities
+	 * picked for a parameterized scan's ppi_clauses.)  Note that it will not
+	 * match to lossy index clauses, which is critical because we have to
+	 * include the original clause in qpqual in that case.
+	 *
+	 * In some situations (particularly with OR'd index conditions) we may
+	 * have scan_clauses that are not equal to, but are logically implied by,
+	 * the index quals; so we also try a predicate_implied_by() check to see
+	 * if we can discard quals that way.  (predicate_implied_by assumes its
+	 * first input contains only immutable functions, so we have to check
+	 * that.)
+	 *
+	 * Note: if you change this bit of code you should also look at
+	 * extract_nonindex_conditions() in costsize.c.
+	 */
+	qpqual = NIL;
+	foreach(l, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (rinfo->pseudoconstant)
+			continue;			/* we may drop pseudoconstants here */
+		if (is_redundant_with_indexclauses(rinfo, indexclauses))
+			continue;			/* dup or derived from same EquivalenceClass */
+		if (!contain_mutable_functions((Node *) rinfo->clause) &&
+			predicate_implied_by(list_make1(rinfo->clause), stripped_indexquals,
+								 false))
+			continue;			/* provably implied by indexquals */
+		qpqual = lappend(qpqual, rinfo);
+	}
+
+	/* Sort clauses into best execution order */
+	qpqual = order_qual_clauses(root, qpqual);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	qpqual = extract_actual_clauses(qpqual, false);
+
+	/*
+	 * We have to replace any outer-relation variables with nestloop params in
+	 * the indexqualorig, qpqual, and indexorderbyorig expressions.  A bit
+	 * annoying to have to do this separately from the processing in
+	 * fix_indexqual_references --- rethink this when generalizing the inner
+	 * indexscan support.  But note we can't really do this earlier because
+	 * it'd break the comparisons to predicates above ... (or would it?  Those
+	 * wouldn't have outer refs)
+	 */
+	if (best_path->ipath.path.param_info)
+	{
+		stripped_indexquals = (List *)
+			replace_nestloop_params(root, (Node *) stripped_indexquals);
+		qpqual = (List *)
+			replace_nestloop_params(root, (Node *) qpqual);
+	}
+
+	/* Finally ready to build the plan node */
+	brinsort_plan = make_brinsort(tlist,
+								  qpqual,
+								  baserelid,
+								  indexoid,
+								  fixed_indexquals,
+								  stripped_indexquals,
+								  best_path->ipath.indexscandir);
+
+	if (pathkeys != NIL)
+	{
+		/*
+		 * Compute sort column info, and adjust the Append's tlist as needed.
+		 * Because we pass adjust_tlist_in_place = true, we may ignore the
+		 * function result; it must be the same plan node.  However, we then
+		 * need to detect whether any tlist entries were added.
+		 */
+		(void) prepare_sort_from_pathkeys((Plan *) brinsort_plan, pathkeys,
+										  best_path->ipath.path.parent->relids,
+										  NULL,
+										  true,
+										  &brinsort_plan->numCols,
+										  &brinsort_plan->sortColIdx,
+										  &brinsort_plan->sortOperators,
+										  &brinsort_plan->collations,
+										  &brinsort_plan->nullsFirst);
+		//tlist_was_changed = (orig_tlist_length != list_length(plan->plan.targetlist));
+		for (int i = 0; i < brinsort_plan->numCols; i++)
+			elog(DEBUG1, "%d => %d %d %d %d", i,
+				 brinsort_plan->sortColIdx[i],
+				 brinsort_plan->sortOperators[i],
+				 brinsort_plan->collations[i],
+				 brinsort_plan->nullsFirst[i]);
+	}
+
+	copy_generic_path_info(&brinsort_plan->scan.plan, &best_path->ipath.path);
+
+	return brinsort_plan;
+}
+
 /*
  * create_bitmap_scan_plan
  *	  Returns a bitmap scan plan for the base relation scanned by 'best_path'
@@ -5539,6 +5702,31 @@ make_indexscan(List *qptlist,
 	return node;
 }
 
+static BrinSort *
+make_brinsort(List *qptlist,
+			   List *qpqual,
+			   Index scanrelid,
+			   Oid indexid,
+			   List *indexqual,
+			   List *indexqualorig,
+			   ScanDirection indexscandir)
+{
+	BrinSort  *node = makeNode(BrinSort);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->indexid = indexid;
+	node->indexqual = indexqual;
+	node->indexqualorig = indexqualorig;
+	node->indexorderdir = indexscandir;
+
+	return node;
+}
+
 static IndexOnlyScan *
 make_indexonlyscan(List *qptlist,
 				   List *qpqual,
@@ -7175,6 +7363,7 @@ is_projection_capable_path(Path *path)
 		case T_Memoize:
 		case T_Sort:
 		case T_IncrementalSort:
+		case T_BrinSort:
 		case T_Unique:
 		case T_SetOp:
 		case T_LockRows:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 186fc8014b..ae5cf85317 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -703,6 +703,25 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 				return set_indexonlyscan_references(root, splan, rtoffset);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *splan = (BrinSort *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->indexqual =
+					fix_scan_list(root, splan->indexqual,
+								  rtoffset, 1);
+				splan->indexqualorig =
+					fix_scan_list(root, splan->indexqualorig,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *splan = (BitmapIndexScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d749b50578..478d56234a 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1028,6 +1028,63 @@ create_index_path(PlannerInfo *root,
 	return pathnode;
 }
 
+
+/*
+ * create_brinsort_path
+ *	  Creates a path node for sorted brin sort scan.
+ *
+ * 'index' is a usable index.
+ * 'indexclauses' is a list of IndexClause nodes representing clauses
+ *			to be enforced as qual conditions in the scan.
+ * 'indexorderbys' is a list of bare expressions (no RestrictInfos)
+ *			to be used as index ordering operators in the scan.
+ * 'indexorderbycols' is an integer list of index column numbers (zero based)
+ *			the ordering operators can be used with.
+ * 'pathkeys' describes the ordering of the path.
+ * 'indexscandir' is ForwardScanDirection or BackwardScanDirection
+ *			for an ordered index, or NoMovementScanDirection for
+ *			an unordered index.
+ * 'indexonly' is true if an index-only scan is wanted.
+ * 'required_outer' is the set of outer relids for a parameterized path.
+ * 'loop_count' is the number of repetitions of the indexscan to factor into
+ *		estimates of caching behavior.
+ * 'partial_path' is true if constructing a parallel index scan path.
+ *
+ * Returns the new path node.
+ */
+BrinSortPath *
+create_brinsort_path(PlannerInfo *root,
+					 IndexOptInfo *index,
+					 List *indexclauses,
+					 List *pathkeys,
+					 ScanDirection indexscandir,
+					 bool indexonly,
+					 Relids required_outer,
+					 double loop_count,
+					 bool partial_path)
+{
+	BrinSortPath  *pathnode = makeNode(BrinSortPath);
+	RelOptInfo *rel = index->rel;
+
+	pathnode->ipath.path.pathtype = T_BrinSort;
+	pathnode->ipath.path.parent = rel;
+	pathnode->ipath.path.pathtarget = rel->reltarget;
+	pathnode->ipath.path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->ipath.path.parallel_aware = false;
+	pathnode->ipath.path.parallel_safe = rel->consider_parallel;
+	pathnode->ipath.path.parallel_workers = 0;
+	pathnode->ipath.path.pathkeys = pathkeys;
+
+	pathnode->ipath.indexinfo = index;
+	pathnode->ipath.indexclauses = indexclauses;
+	pathnode->ipath.indexscandir = indexscandir;
+
+	cost_brinsort(pathnode, root, loop_count, partial_path);
+
+	return pathnode;
+}
+
 /*
  * create_bitmap_heap_path
  *	  Creates a path node for a bitmap scan.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c4903e2e3e..f67a703550 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -101,6 +101,10 @@ extern bool debug_brin_stats;
 extern bool debug_brin_cross_check;
 #endif
 
+#ifdef DEBUG_BRIN_SORT
+extern bool debug_brin_sort;
+#endif
+
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
 #endif
@@ -1017,6 +1021,16 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_brinsort", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of BRIN sort plans."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_brinsort,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
@@ -1258,6 +1272,20 @@ struct config_bool ConfigureNamesBool[] =
 	},
 #endif
 
+#ifdef DEBUG_BRIN_SORT
+	/* this is undocumented because not exposed in a standard build */
+	{
+		{"debug_brin_sort", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Print info about BRIN sorting."),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&debug_brin_sort,
+		false,
+		NULL, NULL, NULL
+	},
+#endif
+
 	{
 		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Terminate session on any error."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 47e80ad150..10b97e96bc 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -371,6 +371,7 @@
 
 #enable_async_append = on
 #enable_bitmapscan = on
+#enable_brinsort = off
 #enable_gathermerge = on
 #enable_hashagg = on
 #enable_hashjoin = on
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 9ca9835aab..750ee016fd 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2574,6 +2574,18 @@ tuplesort_get_stats(Tuplesortstate *state,
 	}
 }
 
+/*
+ * tuplesort_reset_stats - reset summary statistics
+ *
+ * This can be called before tuplesort_performsort() starts.
+ */
+void
+tuplesort_reset_stats(Tuplesortstate *state)
+{
+	state->isMaxSpaceDisk = false;
+	state->maxSpace = 0;
+}
+
 /*
  * Convert TuplesortMethod to a string.
  */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index 1d21b816fc..cdfa7421ae 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,41 +34,6 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
-/*
- * Info about ranges for BRIN Sort.
- */
-typedef struct BrinRange
-{
-	BlockNumber blkno_start;
-	BlockNumber blkno_end;
-
-	Datum	min_value;
-	Datum	max_value;
-	bool	has_nulls;
-	bool	all_nulls;
-	bool	not_summarized;
-
-	/*
-	 * Index of the range when ordered by min_value (if there are multiple
-	 * ranges with the same min_value, it's the lowest one).
-	 */
-	uint32	min_index;
-
-	/*
-	 * Minimum min_index from all ranges with higher max_value (i.e. when
-	 * sorted by max_value). If there are multiple ranges with the same
-	 * max_value, it depends on the ordering (i.e. the ranges may get
-	 * different min_index_lowest, depending on the exact ordering).
-	 */
-	uint32	min_index_lowest;
-} BrinRange;
-
-typedef struct BrinRanges
-{
-	int			nranges;
-	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
-} BrinRanges;
-
 typedef struct BrinMinmaxStats
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index eac796e6f4..6dd3f3e3c1 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -76,6 +76,7 @@ typedef struct BrinDesc
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
 #define BRIN_PROCNUM_STATISTICS		11	/* optional */
+#define BRIN_PROCNUM_RANGES 		12	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 9bbd1f14f1..b499699b35 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -806,6 +806,8 @@
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -839,6 +841,8 @@
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -870,6 +874,8 @@
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -901,6 +907,8 @@
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -915,6 +923,8 @@
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -929,6 +939,8 @@
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1048,6 +1060,8 @@
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1078,6 +1092,8 @@
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1128,6 +1144,8 @@
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1181,6 +1199,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1197,6 +1218,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1288,6 +1312,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1344,6 +1371,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1398,6 +1428,8 @@
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1471,6 +1503,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1504,6 +1539,8 @@
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1557,6 +1594,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1573,6 +1613,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1587,6 +1630,8 @@
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1716,6 +1761,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1772,6 +1820,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1824,6 +1875,8 @@
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1841,6 +1894,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1858,6 +1914,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1912,6 +1971,8 @@
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1988,6 +2049,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 90abbf383d..7e9c1f8bef 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5309,7 +5309,7 @@
   proname => 'pg_stat_get_numscans', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_numscans' },
-{ oid => '9976', descr => 'statistics: time of the last scan for table/index',
+{ oid => '8912', descr => 'statistics: time of the last scan for table/index',
   proname => 'pg_stat_get_lastscan', provolatile => 's', proparallel => 'r',
   prorettype => 'timestamptz', proargtypes => 'oid',
   prosrc => 'pg_stat_get_lastscan' },
@@ -8491,6 +8491,9 @@
   proname => 'brin_minmax_stats', prorettype => 'bool',
   proargtypes => 'internal internal int2 int2 internal int4',
   prosrc => 'brin_minmax_stats' },
+{ oid => '9801', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_ranges', prorettype => 'bool',
+  proargtypes => 'internal int2 bool', prosrc => 'brin_minmax_ranges' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/executor/nodeBrinSort.h b/src/include/executor/nodeBrinSort.h
new file mode 100644
index 0000000000..3cac599d81
--- /dev/null
+++ b/src/include/executor/nodeBrinSort.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeBrinSort.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODEBRIN_SORT_H
+#define NODEBRIN_SORT_H
+
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "nodes/execnodes.h"
+
+extern BrinSortState *ExecInitBrinSort(BrinSort *node, EState *estate, int eflags);
+extern void ExecEndBrinSort(BrinSortState *node);
+extern void ExecBrinSortMarkPos(BrinSortState *node);
+extern void ExecBrinSortRestrPos(BrinSortState *node);
+extern void ExecReScanBrinSort(BrinSortState *node);
+extern void ExecBrinSortEstimate(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortReInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeWorker(BrinSortState *node,
+										  ParallelWorkerContext *pwcxt);
+
+/*
+ * These routines are exported to share code with nodeIndexonlyscan.c and
+ * nodeBitmapBrinSort.c
+ */
+extern void ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
+								   List *quals, bool isorderby,
+								   ScanKey *scanKeys, int *numScanKeys,
+								   IndexRuntimeKeyInfo **runtimeKeys, int *numRuntimeKeys,
+								   IndexArrayKeyInfo **arrayKeys, int *numArrayKeys);
+extern void ExecIndexEvalRuntimeKeys(ExprContext *econtext,
+									 IndexRuntimeKeyInfo *runtimeKeys, int numRuntimeKeys);
+extern bool ExecIndexEvalArrayKeys(ExprContext *econtext,
+								   IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+extern bool ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+
+#endif							/* NODEBRIN_SORT_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 20f4c8b35f..efe26938d0 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1565,6 +1565,114 @@ typedef struct IndexScanState
 	Size		iss_PscanLen;
 } IndexScanState;
 
+typedef enum {
+	BRINSORT_START,
+	BRINSORT_LOAD_RANGE,
+	BRINSORT_PROCESS_RANGE,
+	BRINSORT_LOAD_NULLS,
+	BRINSORT_PROCESS_NULLS,
+	BRINSORT_FINISHED
+} BrinSortPhase;
+
+typedef struct BrinRangeScanDesc
+{
+	/* range info tuple descriptor */
+	TupleDesc		tdesc;
+
+	/* ranges, sorted by minval, blkno_start */
+	Tuplesortstate *ranges;
+
+	/* number of ranges in the tuplesort */
+	int64			nranges;
+
+	/* distinct minval (sorted) */
+	Tuplestorestate *minvals;
+
+	/* slot for accessing the tuplesort/tuplestore */
+	TupleTableSlot  *slot;
+
+} BrinRangeScanDesc;
+
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinSortState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	ExprState  *indexqualorig;
+	List	   *indexorderbyorig;
+	struct ScanKeyData *iss_ScanKeys;
+	int			iss_NumScanKeys;
+	struct ScanKeyData *iss_OrderByKeys;
+	int			iss_NumOrderByKeys;
+	IndexRuntimeKeyInfo *iss_RuntimeKeys;
+	int			iss_NumRuntimeKeys;
+	bool		iss_RuntimeKeysReady;
+	ExprContext *iss_RuntimeContext;
+	Relation	iss_RelationDesc;
+	struct IndexScanDescData *iss_ScanDesc;
+
+	/* These are needed for re-checking ORDER BY expr ordering */
+	pairingheap *iss_ReorderQueue;
+	bool		iss_ReachedEnd;
+	Datum	   *iss_OrderByValues;
+	bool	   *iss_OrderByNulls;
+	SortSupport iss_SortSupport;
+	bool	   *iss_OrderByTypByVals;
+	int16	   *iss_OrderByTypLens;
+	Size		iss_PscanLen;
+
+	/* */
+	BrinRangeScanDesc *bs_scan;
+	BrinRange	   *bs_range;
+	ExprState	   *bs_qual;
+	Datum			bs_watermark;
+	bool			bs_watermark_set;
+	bool			bs_watermark_empty;
+	BrinSortPhase	bs_phase;
+	SortSupportData	bs_sortsupport;
+	ProjectionInfo *bs_ProjInfo;
+
+	/*
+	 * We need two tuplesort instances - one for current range, one for
+	 * spill-over tuples from the overlapping ranges
+	 */
+	void		   *bs_tuplesortstate;
+	Tuplestorestate *bs_tuplestore;
+} BrinSortState;
+
 /* ----------------
  *	 IndexOnlyScanState information
  *
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0d4b1ec4e4..1f248f12e6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1694,6 +1694,17 @@ typedef struct IndexPath
 	Selectivity indexselectivity;
 } IndexPath;
 
+/*
+ * read sorted data from brin index
+ *
+ * We use IndexPath, because that's what amcostestimate is expecting, but
+ * we typedef it as a separate struct.
+ */
+typedef struct BrinSortPath
+{
+	IndexPath	ipath;
+} BrinSortPath;
+
 /*
  * Each IndexClause references a RestrictInfo node from the query's WHERE
  * or JOIN conditions, and shows how that restriction can be applied to
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4781a9c632..5f6f6f6ed7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -501,6 +501,32 @@ typedef struct IndexOnlyScan
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
 } IndexOnlyScan;
 
+
+typedef struct BrinSort
+{
+	Scan		scan;
+	Oid			indexid;		/* OID of index to scan */
+	List	   *indexqual;		/* list of index quals (usually OpExprs) */
+	List	   *indexqualorig;	/* the same in original form */
+	ScanDirection indexorderdir;	/* forward or backward or don't care */
+
+	/* number of sort-key columns */
+	int			numCols;
+
+	/* their indexes in the target list */
+	AttrNumber *sortColIdx pg_node_attr(array_size(numCols));
+
+	/* OIDs of operators to sort them by */
+	Oid		   *sortOperators pg_node_attr(array_size(numCols));
+
+	/* OIDs of collations */
+	Oid		   *collations pg_node_attr(array_size(numCols));
+
+	/* NULLS FIRST/LAST directions */
+	bool	   *nullsFirst pg_node_attr(array_size(numCols));
+
+} BrinSort;
+
 /* ----------------
  *		bitmap index scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6cf49705d3..1fed43645a 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -70,6 +70,7 @@ extern PGDLLIMPORT bool enable_parallel_hash;
 extern PGDLLIMPORT bool enable_partition_pruning;
 extern PGDLLIMPORT bool enable_presorted_aggregate;
 extern PGDLLIMPORT bool enable_async_append;
+extern PGDLLIMPORT bool enable_brinsort;
 extern PGDLLIMPORT int constraint_exclusion;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
@@ -80,6 +81,8 @@ extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 							ParamPathInfo *param_info);
 extern void cost_index(IndexPath *path, PlannerInfo *root,
 					   double loop_count, bool partial_path);
+extern void cost_brinsort(BrinSortPath *path, PlannerInfo *root,
+						  double loop_count, bool partial_path);
 extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 								  ParamPathInfo *param_info,
 								  Path *bitmapqual, double loop_count);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 69be701b16..03ecc98800 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -49,6 +49,15 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									Relids required_outer,
 									double loop_count,
 									bool partial_path);
+extern BrinSortPath *create_brinsort_path(PlannerInfo *root,
+									IndexOptInfo *index,
+									List *indexclauses,
+									List *pathkeys,
+									ScanDirection indexscandir,
+									bool indexonly,
+									Relids required_outer,
+									double loop_count,
+									bool partial_path);
 extern BitmapHeapPath *create_bitmap_heap_path(PlannerInfo *root,
 											   RelOptInfo *rel,
 											   Path *bitmapqual,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 736d78ea4c..3e1c945762 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -212,6 +212,9 @@ extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,
 extern Path *get_cheapest_parallel_safe_total_inner(List *paths);
 extern List *build_index_pathkeys(PlannerInfo *root, IndexOptInfo *index,
 								  ScanDirection scandir);
+extern List *build_index_pathkeys_brin(PlannerInfo *root, IndexOptInfo *index,
+								  TargetEntry *tle, int idx,
+								  bool reverse_sort, bool nulls_first);
 extern List *build_partition_pathkeys(PlannerInfo *root, RelOptInfo *partrel,
 									  ScanDirection scandir, bool *partialkeys);
 extern List *build_expression_pathkey(PlannerInfo *root, Expr *expr,
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 12578e42bc..45413dac1a 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -367,6 +367,7 @@ extern void tuplesort_reset(Tuplesortstate *state);
 
 extern void tuplesort_get_stats(Tuplesortstate *state,
 								TuplesortInstrumentation *stats);
+extern void tuplesort_reset_stats(Tuplesortstate *state);
 extern const char *tuplesort_method_name(TuplesortMethod m);
 extern const char *tuplesort_space_type_name(TuplesortSpaceType t);
 
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index b7fda6fc82..308e912c21 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
 --------------------------------+---------
  enable_async_append            | on
  enable_bitmapscan              | on
+ enable_brinsort                | on
  enable_gathermerge             | on
  enable_hashagg                 | on
  enable_hashjoin                | on
@@ -133,7 +134,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(22 rows)
+(23 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.39.0

0005-wip-brinsort-explain-stats-20230206.patchtext/x-patch; charset=UTF-8; name=0005-wip-brinsort-explain-stats-20230206.patchDownload

From afcca0c8e592841d64da3dad38d55e983a7a64fa Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 22:01:01 +0200
Subject: [PATCH 05/10] wip: brinsort explain stats

Show some internal stats about BRIN Sort in EXPLAIN output.
---
 src/backend/commands/explain.c      | 132 ++++++++++++++++++++++++++++
 src/backend/executor/nodeBrinSort.c |  66 +++++++++++---
 src/include/nodes/execnodes.h       |  39 ++++++++
 3 files changed, 223 insertions(+), 14 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ca1c625a01..6a14a2ddba 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -87,6 +87,8 @@ static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
 static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
 							   ExplainState *es);
+static void show_brinsort_stats(BrinSortState *sortstate, List *ancestors,
+								ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1814,6 +1816,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			show_brinsort_stats(castNode(BrinSortState, planstate), ancestors, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
@@ -2432,6 +2435,135 @@ show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 						 ancestors, es);
 }
 
+static void
+show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSortStats  *stats = &sortstate->bs_stats;
+
+	if (sortstate->bs_scan != NULL &&
+		sortstate->bs_scan->ranges != NULL)
+	{
+		TuplesortInstrumentation stats;
+
+		memset(&stats, 0, sizeof(TuplesortInstrumentation));
+		tuplesort_get_stats(sortstate->bs_scan->ranges, &stats);
+
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Ranges: " INT64_FORMAT "  Build time: " INT64_FORMAT "  Method: %s  Space: %ld kB (%s)\n",
+						 sortstate->bs_scan->nranges,
+						 sortstate->bs_stats.ranges_build_ms,
+						 tuplesort_method_name(stats.sortMethod),
+						 stats.spaceUsed,
+						 tuplesort_space_type_name(stats.spaceType));
+	}
+
+	if (stats->sort_count > 0)
+	{
+		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
+							   stats->range_count, es);
+
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainPropertyInteger("Sorts", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Tuples Sorted: " INT64_FORMAT "  Per-sort: " INT64_FORMAT  "  Direct: " INT64_FORMAT "  Spilled: " INT64_FORMAT "  Respilled: " INT64_FORMAT "\n",
+							 stats->ntuples_tuplesort_all,
+							 stats->ntuples_tuplesort_all / stats->sort_count,
+							 stats->ntuples_tuplesort_direct,
+							 stats->ntuples_spilled,
+							 stats->ntuples_respilled);
+		}
+		else
+		{
+			ExplainOpenGroup("Sorts", "Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainPropertyInteger("Tuples per sort", NULL, (int64)
+								   stats->ntuples_tuplesort_all / stats->sort_count, es);
+
+			ExplainPropertyInteger("Sorted tuples (all)", NULL, (int64)
+								   stats->ntuples_tuplesort_all, es);
+
+			ExplainPropertyInteger("Sorted tuples (direct)", NULL, (int64)
+								   stats->ntuples_tuplesort_direct, es);
+
+			ExplainPropertyInteger("Spilled tuples", NULL, (int64)
+								   stats->ntuples_spilled, es);
+
+			ExplainPropertyInteger("Respilled tuples", NULL, (int64)
+								   stats->ntuples_respilled, es);
+
+			ExplainCloseGroup("Sorts", "Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_in_memory > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (in-memory)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_in_memory,
+							 stats->total_space_used_in_memory,
+							 stats->max_space_used_in_memory,
+							 stats->total_space_used_in_memory / stats->sort_count_in_memory);
+		}
+		else
+		{
+			ExplainOpenGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_in_memory / stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_in_memory, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_in_memory, es);
+
+			ExplainCloseGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_on_disk > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (on-disk)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_on_disk,
+							 stats->total_space_used_on_disk,
+							 stats->max_space_used_on_disk,
+							 stats->total_space_used_on_disk / stats->sort_count_on_disk);
+		}
+		else
+		{
+			ExplainOpenGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_on_disk / stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_on_disk, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_on_disk, es);
+
+			ExplainCloseGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+		}
+	}
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 5225e64756..324b558ac6 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -450,6 +450,8 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
 		return;
 
+	node->bs_stats.range_count++;
+
 	brinsort_start_tidscan(node);
 
 	scan = node->ss.ss_currentScanDesc;
@@ -534,7 +536,10 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 				/* Stash it to the tuplestore (when NULL, or ignore
 				 * it (when not-NULL). */
 				if (isnull)
+				{
 					tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+					node->bs_stats.ntuples_spilled++;
+				}
 
 				/* NULL or not, we're done */
 				continue;
@@ -554,7 +559,12 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 										  &node->bs_sortsupport);
 
 			if (cmp <= 0)
+			{
 				tuplesort_puttupleslot(node->bs_tuplesortstate, tmpslot);
+				node->bs_stats.ntuples_tuplesort_direct++;
+				node->bs_stats.ntuples_tuplesort_all++;
+				node->bs_stats.ntuples_tuplesort++;
+			}
 			else
 			{
 				/*
@@ -565,6 +575,7 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 				 * respill) and stop spilling.
 				 */
 				tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+				node->bs_stats.ntuples_spilled++;
 			}
 		}
 
@@ -633,7 +644,11 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 									  &node->bs_sortsupport);
 
 		if (cmp <= 0)
+		{
 			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+			node->bs_stats.ntuples_tuplesort_all++;
+			node->bs_stats.ntuples_tuplesort++;
+		}
 		else
 		{
 			/*
@@ -644,6 +659,7 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 			 * respill) and stop spilling.
 			 */
 			tuplestore_puttupleslot(tupstore, slot);
+			node->bs_stats.ntuples_respilled++;
 		}
 	}
 
@@ -933,23 +949,40 @@ IndexNext(BrinSortState *node)
 					 */
 					if (node->bs_tuplesortstate)
 					{
-#ifdef DEBUG_BRIN_SORT
+						TuplesortInstrumentation stats;
+
+						/*
+						 * Reset tuplesort statistics between runs, otherwise
+						 * we'll keep re-using stats from the largest run.
+						 */
 						tuplesort_reset_stats(node->bs_tuplesortstate);
-#endif
 
 						tuplesort_performsort(node->bs_tuplesortstate);
 
-#ifdef DEBUG_BRIN_SORT
-						if (debug_brin_sort)
-						{
-							TuplesortInstrumentation stats;
+						node->bs_stats.sort_count++;
+						node->bs_stats.ntuples_tuplesort = 0;
 
-							memset(&stats, 0, sizeof(TuplesortInstrumentation));
-							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
 
-							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+						if (stats.spaceType == SORT_SPACE_TYPE_DISK)
+						{
+							node->bs_stats.sort_count_on_disk++;
+							node->bs_stats.total_space_used_on_disk += stats.spaceUsed;
+							node->bs_stats.max_space_used_on_disk = Max(node->bs_stats.max_space_used_on_disk,
+																		stats.spaceUsed);
+						}
+						else if (stats.spaceType == SORT_SPACE_TYPE_MEMORY)
+						{
+							node->bs_stats.sort_count_in_memory++;
+							node->bs_stats.total_space_used_in_memory += stats.spaceUsed;
+							node->bs_stats.max_space_used_in_memory = Max(node->bs_stats.max_space_used_in_memory,
+																		  stats.spaceUsed);
+						}
 
-							elog(WARNING, "method: %s  space: %ld kB (%s)",
+#ifdef DEBUG_BRIN_SORT
+						if (debug_brin_sort)
+						{
+							elog(WARNING, "method: %s  space: " INT64_FORMAT " kB (%s)",
 								 tuplesort_method_name(stats.sortMethod),
 								 stats.spaceUsed,
 								 tuplesort_space_type_name(stats.spaceType));
@@ -1219,9 +1252,10 @@ ExecEndBrinSort(BrinSortState *node)
 		tuplesort_end(node->bs_tuplesortstate);
 	node->bs_tuplesortstate = NULL;
 
-	if (node->bs_scan->ranges != NULL)
+	if (node->bs_scan != NULL &&
+		node->bs_scan->ranges != NULL)
 		tuplesort_end(node->bs_scan->ranges);
-	node->bs_scan->ranges = NULL;
+	node->bs_scan = NULL;
 }
 
 /* ----------------------------------------------------------------
@@ -1314,6 +1348,7 @@ ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
 	TargetEntry *tle;
 	int			j;
 	List	   *indexprs = RelationGetIndexExpressions(indexRel);
+	TimestampTz	start_ts;
 
 	/* BRIN Sort only allows ORDER BY using a single column */
 	Assert(node->numCols == 1);
@@ -1404,15 +1439,18 @@ ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
 
 	/*
 	 * Ask the opclass to produce ranges in appropriate ordering.
-	 *
-	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
 	 */
+	start_ts = GetCurrentTimestamp();
+
 	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
 											node->collations[0],
 											PointerGetDatum(scan),
 											Int16GetDatum(attno),
 											BoolGetDatum(asc)));
 
+	planstate->bs_stats.ranges_build_ms
+		= TimestampDifferenceMilliseconds(start_ts, GetCurrentTimestamp());
+
 	/* allocate for space, and also for the alternative ordering */
 	planstate->bs_scan = brscan;
 }
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index efe26938d0..2a98286e11 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1628,6 +1628,44 @@ typedef struct BrinRanges
 	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
 } BrinRanges;
 
+typedef struct BrinSortStats
+{
+	/* number of sorts */
+	int64	sort_count;
+
+	/* number of ranges loaded */
+	int64	range_count;
+
+	/* tuples in the current tuplesort */
+	int64	ntuples_tuplesort;
+
+	/* tuples written directly to tuplesort */
+	int64	ntuples_tuplesort_direct;
+
+	/* tuples written to tuplesort (all) */
+	int64	ntuples_tuplesort_all;
+
+	/* tuples written to tuplestore */
+	int64	ntuples_spilled;
+
+	/* tuples copied from old to new tuplestore */
+	int64	ntuples_respilled;
+
+	/* number of in-memory/on-disk sorts */
+	int64	sort_count_in_memory;
+	int64	sort_count_on_disk;
+
+	/* total/maximum amount of space used by either sort */
+	int64	total_space_used_in_memory;
+	int64	total_space_used_on_disk;
+	int64	max_space_used_in_memory;
+	int64	max_space_used_on_disk;
+
+	/* time to build ranges (milliseconds) */
+	int64	ranges_build_ms;
+
+} BrinSortStats;
+
 typedef struct BrinSortState
 {
 	ScanState	ss;				/* its first field is NodeTag */
@@ -1664,6 +1702,7 @@ typedef struct BrinSortState
 	BrinSortPhase	bs_phase;
 	SortSupportData	bs_sortsupport;
 	ProjectionInfo *bs_ProjInfo;
+	BrinSortStats	bs_stats;
 
 	/*
 	 * We need two tuplesort instances - one for current range, one for
-- 
2.39.0

0006-wip-multiple-watermark-steps-20230206.patchtext/x-patch; charset=UTF-8; name=0006-wip-multiple-watermark-steps-20230206.patchDownload

From 8cfe5bc1f32e8a7fcf01ee7facb323b909790ad5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 20 Oct 2022 13:03:00 +0200
Subject: [PATCH 06/10] wip: multiple watermark steps

Allow incrementing the minval watermark faster, by skipping some minval
values. This allows sorting more data at once (instead of many tiny
sorts, which is inefficient). This also reduces the number of rows we
need to spill (and possibly transfer multiple times).

To use a different watermark step, use a new GUC:

  SET brinsort_watermark_step = 16
---
 src/backend/commands/explain.c      |  3 ++
 src/backend/executor/nodeBrinSort.c | 59 ++++++++++++++++++++++++++---
 src/backend/utils/misc/guc_tables.c | 12 ++++++
 3 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 6a14a2ddba..f387ed2da2 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -47,6 +47,7 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
 /* Hook for plugins to get control in explain_get_index_name() */
 explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
 
+extern int brinsort_watermark_step;
 
 /* OR-able flags for ExplainXMLTag() */
 #define X_OPENING 0
@@ -2457,6 +2458,8 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 						 tuplesort_space_type_name(stats.spaceType));
 	}
 
+	ExplainPropertyInteger("Step", NULL, (int64) brinsort_watermark_step, es);
+
 	if (stats->sort_count > 0)
 	{
 		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 324b558ac6..e96a27b3ea 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -248,6 +248,14 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 1;
+
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -351,11 +359,24 @@ brinsort_end_tidscan(BrinSortState *node)
  * a separate "first" parameter - "set=false" has the same meaning.
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool asc)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
 {
 	int		cmp;
+
+	/* assume we haven't found a watermark */
 	bool	found = false;
 
+	Assert(steps > 0);
+
+	/*
+	 * If the watermark is empty, either this is the first call (in
+	 * which case we just use the first (or rather second) value.
+	 * Otherwise it means we've reached the end, so no point in looking
+	 * for more watermarks.
+	 */
+	if (node->bs_watermark_empty && !first)
+		return;
+
 	tuplesort_markpos(node->bs_scan->ranges);
 
 	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
@@ -381,22 +402,48 @@ brinsort_update_watermark(BrinSortState *node, bool asc)
 		else
 			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
 
+		/*
+		 * Has to be the first call (otherwise we would not get here, because we
+		 * terminate after bs_watermark_set gets flipped back to false), so we
+		 * just set the value. But we don't count this as a step, because that
+		 * just picks the first minval value, as we certainly need to do at least
+		 * one more step.
+		 *
+		 * XXX Actually, do we need to make another step? Maybe there are enough
+		 * not-summarized ranges? Although, we don't know what values are in
+		 * those, ranges, and with increasing data we might easily end up just
+		 * writing all of it into the spill tuplestore. So making one more step
+		 * seems like a better idea - we'll at lest be able to produce something
+		 * which is good for LIMIT queries.
+		 */
 		if (!node->bs_watermark_set)
 		{
+			Assert(first);
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
+			found = true;
 			continue;
 		}
 
 		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
 								  &node->bs_sortsupport);
 
-		if (cmp < 0)
+		/*
+		 * Values should not decrease (or whatever the operator says, might
+		 * be a DESC sort).
+		 */
+		Assert(cmp <= 0);
+
+		if (cmp < 0)	/* new watermark value */
 		{
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
 			found = true;
-			break;
+
+			steps--;
+
+			if (steps == 0)
+				break;
 		}
 	}
 
@@ -913,7 +960,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -1034,7 +1081,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -1059,7 +1106,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, asc);
+								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f67a703550..3b458fe408 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -95,6 +95,7 @@ extern char *temp_tablespaces;
 extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
 extern bool synchronize_seqscans;
+extern int	brinsort_watermark_step;
 
 #ifdef DEBUG_BRIN_STATS
 extern bool debug_brin_stats;
@@ -3533,6 +3534,17 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"brinsort_watermark_step", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("sets the step for brinsort watermark increments"),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&brinsort_watermark_step,
+		1, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
-- 
2.39.0

0007-wip-adjust-watermark-step-20230206.patchtext/x-patch; charset=UTF-8; name=0007-wip-adjust-watermark-step-20230206.patchDownload

From b08139924cb04db6c2067d61a0d0e9623ef464de Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 00:06:28 +0200
Subject: [PATCH 07/10] wip: adjust watermark step

Look at available statistics - number of possible watermark values,
number of rows, work_mem, etc. and pick a good watermark_step value.

To calculate step using statistics, set the GUC to 0:

   SET brinsort_watermark_step = 0;
---
 src/backend/commands/explain.c          |  6 +++
 src/backend/executor/nodeBrinSort.c     | 21 ++++----
 src/backend/optimizer/plan/createplan.c | 70 +++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c     |  2 +-
 src/include/nodes/execnodes.h           |  5 ++
 src/include/nodes/plannodes.h           |  3 ++
 6 files changed, 94 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f387ed2da2..72a366e855 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2440,6 +2440,7 @@ static void
 show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 {
 	BrinSortStats  *stats = &sortstate->bs_stats;
+	BrinSort   *plan = (BrinSort *) sortstate->ss.ps.plan;
 
 	if (sortstate->bs_scan != NULL &&
 		sortstate->bs_scan->ranges != NULL)
@@ -2462,6 +2463,9 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 
 	if (stats->sort_count > 0)
 	{
+		ExplainPropertyInteger("Average Step", NULL, (int64)
+							   stats->watermark_updates_steps / stats->watermark_updates_count, es);
+
 		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
 							   stats->range_count, es);
 
@@ -2503,6 +2507,8 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 			ExplainCloseGroup("Sorts", "Sorts", true, es);
 		}
 	}
+	else
+		ExplainPropertyInteger("Initial Step", NULL, (int64) plan->watermark_step, es);
 
 	if (stats->sort_count_in_memory > 0)
 	{
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index e96a27b3ea..0ea801a728 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -248,14 +248,6 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
-/*
- * How many distinct minval values to look forward for the next watermark?
- *
- * The smallest step we can do is 1, which means the immediately following
- * (while distinct) minval.
- */
-int brinsort_watermark_step = 1;
-
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -359,9 +351,11 @@ brinsort_end_tidscan(BrinSortState *node)
  * a separate "first" parameter - "set=false" has the same meaning.
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc)
 {
 	int		cmp;
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	int			steps = plan->watermark_step;
 
 	/* assume we haven't found a watermark */
 	bool	found = false;
@@ -449,6 +443,9 @@ brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
 
 	tuplesort_restorepos(node->bs_scan->ranges);
 
+	node->bs_stats.watermark_updates_count++;
+	node->bs_stats.watermark_updates_steps += plan->watermark_step;
+
 	node->bs_watermark_empty = (!found);
 }
 
@@ -960,7 +957,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, true, asc);
 				}
 
 				break;
@@ -1081,7 +1078,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, false, asc);
 				}
 
 				break;
@@ -1106,7 +1103,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+								brinsort_update_watermark(node, true, asc);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 78837928d9..ba96f1ce04 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -18,6 +18,7 @@
 
 #include <math.h>
 
+#include "access/brin.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
 #include "foreign/fdwapi.h"
@@ -321,6 +322,14 @@ static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
 											 GatherMergePath *best_path);
 
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 0;
+
 /*
  * create_plan
  *	  Creates the access plan for a query by recursively processing the
@@ -3345,6 +3354,67 @@ create_brinsort_plan(PlannerInfo *root,
 
 	copy_generic_path_info(&brinsort_plan->scan.plan, &best_path->ipath.path);
 
+	/*
+	 * determine watermark step (how fast to advance)
+	 *
+	 * If the brinsort_watermark_step is set to a non-zero value, we just use
+	 * that value directly. Otherwise we pick a value using some simple
+	 * heuristics heuristics - we don't want the rows to exceed work_mem, and
+	 * we leave a bit slack (because we're adding batches of rows, not row
+	 * by row).
+	 *
+	 * This has a weakness, because it assumes we incrementally add the same
+	 * number of rows into the "sort" set - but imagine very wide overlapping
+	 * ranges (e.g. random data on the same domain). Most of them will have
+	 * about the same minval, so the sort grows only very slowly. Until the
+	 * very last range, that removes the watermark and only then do most of
+	 * the rows get to the tuplesort.
+	 *
+	 * XXX But maybe we can look at the other statistics we have, like number
+	 * of overlaps and average range selectivity (% of tuples matching), and
+	 * deduce something from that?
+	 *
+	 * XXX Could we maybe adjust the watermark step adaptively at runtime?
+	 * That is, when we get to the "sort" step, maybe check how many rows
+	 * are there, and if there are only few then try increasing the step?
+	 */
+	brinsort_plan->watermark_step = brinsort_watermark_step;
+
+	if (brinsort_plan->watermark_step == 0)
+	{
+		BrinMinmaxStats *amstats;
+
+		/**/
+		Cardinality		rows = brinsort_plan->scan.plan.plan_rows;
+
+		/* estimate rowsize in the tuplesort */
+		int				width = brinsort_plan->scan.plan.plan_width;
+		int				tupwidth = (MAXALIGN(width) + MAXALIGN(SizeofHeapTupleHeader));
+
+		/* Don't overflow work_mem (use only half to absorb variations. */
+		int				maxrows = (work_mem * 1024L / tupwidth / 2);
+
+		/* If this is a LIMIT query, aim only for the required number of rows. */
+		if (root->limit_tuples > 0)
+			maxrows = Min(maxrows, root->limit_tuples);
+
+		/* FIXME hard-coded attnum */
+		amstats = (BrinMinmaxStats *) get_attindexam(brinsort_plan->indexid, 1);
+
+		if (amstats)
+		{
+			double	pct_per_step = Max(amstats->minval_increment_avg,
+									   amstats->maxval_increment_avg);
+			double	rows_per_step = Max(1.0, pct_per_step * rows);
+
+			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
+		}
+
+		/* some rough safety estimates */
+		brinsort_plan->watermark_step = Max(brinsort_plan->watermark_step, 1);
+		brinsort_plan->watermark_step = Min(brinsort_plan->watermark_step, 8192);
+	}
+
 	return brinsort_plan;
 }
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 3b458fe408..9c0da3d70f 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3541,7 +3541,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_NOT_IN_SAMPLE
 		},
 		&brinsort_watermark_step,
-		1, 1, INT_MAX,
+		0, 0, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 2a98286e11..06dc6416d9 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1664,6 +1664,10 @@ typedef struct BrinSortStats
 	/* time to build ranges (milliseconds) */
 	int64	ranges_build_ms;
 
+	/* number/sum of watermark update steps */
+	int64	watermark_updates_steps;
+	int64	watermark_updates_count;
+
 } BrinSortStats;
 
 typedef struct BrinSortState
@@ -1696,6 +1700,7 @@ typedef struct BrinSortState
 	BrinRangeScanDesc *bs_scan;
 	BrinRange	   *bs_range;
 	ExprState	   *bs_qual;
+	int				bs_watermark_step;
 	Datum			bs_watermark;
 	bool			bs_watermark_set;
 	bool			bs_watermark_empty;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 5f6f6f6ed7..0c56dce3c8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -525,6 +525,9 @@ typedef struct BrinSort
 	/* NULLS FIRST/LAST directions */
 	bool	   *nullsFirst pg_node_attr(array_size(numCols));
 
+	/* number of watermark steps to make */
+	int			watermark_step;
+
 } BrinSort;
 
 /* ----------------
-- 
2.39.0

0008-wip-adaptive-watermark-step-20230206.patchtext/x-patch; charset=UTF-8; name=0008-wip-adaptive-watermark-step-20230206.patchDownload

From 792c4c6073042d0a7a187e76acff28a4967636ab Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 01:39:39 +0200
Subject: [PATCH 08/10] wip: adaptive watermark step

Another option it to adjust the watermark step based on past tuplesort
executions, and either increase or decrease the step, based on whether
the sort was in-memory or on-disk, etc.

To do this, set the GUC to -1:

  SET brinsort_watermark_step = -1;
---
 src/backend/access/brin/brin_minmax.c   |   7 +-
 src/backend/executor/nodeBrinSort.c     | 189 +++++++++++++++++++++++-
 src/backend/optimizer/plan/createplan.c |  21 +--
 src/backend/utils/misc/guc_tables.c     |   2 +-
 src/include/nodes/execnodes.h           |   2 +-
 src/include/nodes/plannodes.h           |   3 +-
 6 files changed, 206 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 987379c911..d8dd6ddc9e 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -46,9 +46,6 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
 
-/* print info about ranges */
-#define BRINSORT_DEBUG
-
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1916,7 +1913,7 @@ brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan, TupleTableSlot *slot,
 	scan->nranges++;
 }
 
-#ifdef BRINSORT_DEBUG
+#ifdef BRIN_SORT_DEBUG
 /*
  * brin_minmax_scan_next
  *		Return the next BRIN range information from the tuplestore.
@@ -2125,7 +2122,7 @@ brin_minmax_ranges(PG_FUNCTION_ARGS)
 	/* do the sort and any necessary post-processing */
 	brin_minmax_scan_finalize(brscan);
 
-#ifdef BRINSORT_DEBUG
+#ifdef BRIN_SORT_DEBUG
 	brin_minmax_scan_dump(brscan);
 #endif
 
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 0ea801a728..8f1084ef6e 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -218,6 +218,8 @@
  *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
  *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
  */
+#include <math.h>
+
 #include "postgres.h"
 
 #include "access/brin.h"
@@ -248,6 +250,14 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int	brinsort_watermark_step = 0;
+
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -859,6 +869,175 @@ brinsort_rescan(BrinSortState *node)
 	tuplesort_rescan(node->bs_scan->ranges);
 }
 
+/*
+ * Look at the tuplesort statistics, and maybe increase or decrease the
+ * watermark step. If the last sort was in-memory, we decrease the step.
+ * If the sort was in-memory, but we used less than work_mem/3, increment
+ * the step value.
+ *
+ * XXX This should probably behave differently for LIMIT queries, so that
+ * we don't load too many rows unnecessarily. We already consider that in
+ * create_brinsort_plan, but maybe we should limit increments to the ste
+ * value here too - say, by tracking how many rows are we supposed to
+ * produce, and limiting the watermark so that we don't process too many
+ * rows in future steps.
+ *
+ * XXX We might also track the number of rows in the sort and space used,
+ * to calculate more accurate estimate of row width. And then use that to
+ * calculate number of rows that fit into work_mem. But the number of rows
+ * that go into tuplesort (per range) added would still remain fairly
+ * inaccurate, so not sure how good this woud be.
+ */
+static void
+brinsort_adjust_watermark_step(BrinSortState *node, TuplesortInstrumentation *stats)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+
+	if (brinsort_watermark_step != -1)
+		return;
+
+	if (stats->spaceType == SORT_SPACE_TYPE_DISK)
+	{
+		/*
+		 * We don't know how much to decrease the step (hard to estimate
+		 * due to space needed for in-memory and on-disk sorts is not
+		 * easily comparable, so we just cut the step in half. For the
+		 * in-memory sort, we then can do better estimate and increase
+		 * the step more accurately.
+		 */
+		plan->watermark_step = Max(1, plan->watermark_step / 2);
+	}
+	else
+	{
+		/*
+		 * Adjust the step based on the last sort - we shoot for 2/3 of
+		 * work_mem, to keep some slack (and not switch to on-disk sort
+		 * due to minor differences). We calculate the average row width
+		 * using space used and number of rows in the tuplesort, number
+		 * of rows we could fit into work_mem, and how many steps would
+		 * that mean (assuming number of rows is proportional to the
+		 * number of steps).
+		 *
+		 * We need to be careful about the number of rows we're supposed
+		 * to produce (and how many we already produced). Consider for
+		 * example a query with LIMIT 1000, and that we produce 999 rows
+		 * in the first sort, so that we need only 1 more row. It would
+		 * be silly to pick the steps with the goal to "fill work_mem"
+		 * instead of just enough to produce the one row.
+		 *
+		 * XXX In principle, we don't know how many rows will need to be
+		 * read from the table - there may be interesting rows already in
+		 * the tuplestore (in which case we could do a smaller step). But
+		 * we don't know how many such rows are there - maybe if we had
+		 * multiple smaller tuplestores, which would also reduce the
+		 * amount of "respill" we need to do.
+		 */
+		int		nrows_remaining;
+		int		step = plan->watermark_step;
+		int		step_max = plan->watermark_step * 2;
+
+		/* number of remaining rows we're expected to produce */
+		nrows_remaining = Max(1.0, plan->step_maxrows - node->bs_stats.ntuples_tuplesort_all);
+
+		/*
+		 * If we sorted any rows, calculate how many similar rows we can fit
+		 * into work_mem. We restrict ourselves to 2/3 of work_mem, to leave
+		 * a bit of slack space.
+		 *
+		 * XXX Hopefully the average width is somewhat accurate, but maybe
+		 * we should remember the width we originally expected, and combine
+		 * that somehow. Maybe we should not use just the last tuplesort,
+		 * but instead accumulate average from all preceding sorts and
+		 * combine them somehow (say, using weighted average with older
+		 * values having less influence).
+		 */
+		if (node->bs_stats.ntuples_tuplesort > 0)
+		{
+			int		nrows_wmem;
+			int		avgwidth;
+
+			/* average tuple width, calculated from last sort */
+			avgwidth = (stats->spaceUsed * 1024L / node->bs_stats.ntuples_tuplesort);
+
+			/*
+			 * Calculate the numer of rows to fit into 2/3 of work_mem, but
+			 * cap to the number of rows we're expected to produce.
+			 */
+			nrows_wmem = Min(nrows_remaining, (2 * 1024L * work_mem / 3) / avgwidth);
+
+			/* scale the number of steps to produce the number of rows */
+			step = step * ((double) (nrows_wmem * avgwidth) / (stats->spaceUsed * 1024L));
+
+			/* remember this as the max, so that we don't overflow work_mem */
+			step_max = Min(step, step_max);
+
+			/* however, make sure we don't grow too fast - cap to 2x */
+			step = Min(step, step_max);
+		}
+
+		/*
+		 * Now calculate average step size using data from all sorts we did
+		 * up to now. Then we calculate the number of steps we expect to be
+		 * necessary.
+		 *
+		 * If we had calculated average number of rows per step from AM stats,
+		 * consider that too. It's possible the batch had just one row, which
+		 * might result in very high estimate of steps - it'd be silly to
+		 * jump e.g. from 1 to 1000 based on this unreliable statistics. To
+		 * prevent that, we combine the two rows_per_step sources as weighted
+		 * sum, using the observed vs. target number of rows as weight. The
+		 * closer we're to the target, the more reliable value from past
+		 * executions is.
+		 *
+		 * But we don't want to overflow work_mem, so cap by step_max.
+		 */
+		if (node->bs_stats.ntuples_tuplesort_all > 0)
+		{
+			double		rows_per_step;
+
+			/* average number of rows we produced per step so far */
+			rows_per_step = (double) node->bs_stats.ntuples_tuplesort_all / node->bs_stats.watermark_updates_steps;
+
+			/*
+			 * If we have AM stats with average number of rows per step, consider
+			 * that too - approximate depending on what fraction of rows we already
+			 * produced (with higher fraction of rows produced we prefer the local
+			 * average, as opposed to the global average from index AM stats).
+			 */
+			if (plan->rows_per_step > 0)
+			{
+				/* number of rows we already produced (as a fraction) */
+				double weight = (double) node->bs_stats.ntuples_tuplesort_all / plan->step_maxrows;
+
+				/* paranoia */
+				weight = Min(1.0, weight);
+
+				/*
+				 * Approximate between index AM and "local" average calculated
+				 * from past executions. The closer we get to target rows, the
+				 * more we ignore the index AM stats.
+				 */
+				rows_per_step = weight * rows_per_step + (1 - weight) * plan->rows_per_step;
+			}
+
+			/* approximate the steps between */
+			step = Max(step, ceil((double) nrows_remaining / rows_per_step));
+
+			/*
+			 * But don't overflow the current max (which is set either
+			 * as 2x starting value, or from work_mem.
+			 */
+			step = Min(step, step_max);
+		}
+
+		plan->watermark_step = step;
+
+	}
+
+	plan->watermark_step = Max(1, plan->watermark_step);
+	plan->watermark_step = Min(8192, plan->watermark_step);
+}
+
 /* ----------------------------------------------------------------
  *		IndexNext
  *
@@ -997,13 +1176,21 @@ IndexNext(BrinSortState *node)
 
 						/*
 						 * Reset tuplesort statistics between runs, otherwise
-						 * we'll keep re-using stats from the largest run.
+						 * we'll keep re-using stats from the largest run, which
+						 * would then confuse the adaptive adjustment of the
+						 * watermark step.
 						 */
 						tuplesort_reset_stats(node->bs_tuplesortstate);
 
 						tuplesort_performsort(node->bs_tuplesortstate);
 
 						node->bs_stats.sort_count++;
+
+						memset(&stats, 0, sizeof(TuplesortInstrumentation));
+						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+						brinsort_adjust_watermark_step(node, &stats);
+
 						node->bs_stats.ntuples_tuplesort = 0;
 
 						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ba96f1ce04..8d7114a2e8 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -322,13 +322,8 @@ static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
 											 GatherMergePath *best_path);
 
 
-/*
- * How many distinct minval values to look forward for the next watermark?
- *
- * The smallest step we can do is 1, which means the immediately following
- * (while distinct) minval.
- */
-int brinsort_watermark_step = 0;
+/* defined in nodeBrinSort.c */
+extern int brinsort_watermark_step;
 
 /*
  * create_plan
@@ -3379,8 +3374,14 @@ create_brinsort_plan(PlannerInfo *root,
 	 * are there, and if there are only few then try increasing the step?
 	 */
 	brinsort_plan->watermark_step = brinsort_watermark_step;
+	brinsort_plan->rows_per_step = -1;
 
-	if (brinsort_plan->watermark_step == 0)
+	if (root->limit_tuples > 0)
+		brinsort_plan->step_maxrows = root->limit_tuples;
+	else
+		brinsort_plan->step_maxrows = brinsort_plan->scan.plan.plan_rows;
+
+	if (brinsort_plan->watermark_step <= 0)
 	{
 		BrinMinmaxStats *amstats;
 
@@ -3407,7 +3408,9 @@ create_brinsort_plan(PlannerInfo *root,
 									   amstats->maxval_increment_avg);
 			double	rows_per_step = Max(1.0, pct_per_step * rows);
 
-			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
+			brinsort_plan->rows_per_step = rows_per_step;
+
+			brinsort_plan->watermark_step = (int) ceil(maxrows / rows_per_step);
 		}
 
 		/* some rough safety estimates */
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 9c0da3d70f..8948fb44d3 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3541,7 +3541,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_NOT_IN_SAMPLE
 		},
 		&brinsort_watermark_step,
-		0, 0, INT_MAX,
+		0, -1, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 06dc6416d9..a305931405 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1713,7 +1713,7 @@ typedef struct BrinSortState
 	 * We need two tuplesort instances - one for current range, one for
 	 * spill-over tuples from the overlapping ranges
 	 */
-	void		   *bs_tuplesortstate;
+	Tuplesortstate  *bs_tuplesortstate;
 	Tuplestorestate *bs_tuplestore;
 } BrinSortState;
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0c56dce3c8..658eaba137 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -527,7 +527,8 @@ typedef struct BrinSort
 
 	/* number of watermark steps to make */
 	int			watermark_step;
-
+	int			step_maxrows;
+	int			rows_per_step;
 } BrinSort;
 
 /* ----------------
-- 
2.39.0

0009-wip-add-brin_sort.sql-test-20230206.patchtext/x-patch; charset=UTF-8; name=0009-wip-add-brin_sort.sql-test-20230206.patchDownload

From 22a8015a98d5f52f777dc1c17651397b4e0e4dad Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 3 Feb 2023 13:19:24 +0100
Subject: [PATCH 09/10] wip: add brin_sort.sql test

---
 src/test/regress/expected/brin_sort.out       | 543 ++++++++++++
 src/test/regress/expected/brin_sort_exprs.out | 788 ++++++++++++++++++
 src/test/regress/expected/brin_sort_multi.out | 545 ++++++++++++
 .../expected/brin_sort_multi_exprs.out        | 784 +++++++++++++++++
 src/test/regress/parallel_schedule            |   6 +
 src/test/regress/sql/brin_sort.sql            | 238 ++++++
 src/test/regress/sql/brin_sort_exprs.sql      | 373 +++++++++
 src/test/regress/sql/brin_sort_multi.sql      | 235 ++++++
 .../regress/sql/brin_sort_multi_exprs.sql     | 369 ++++++++
 9 files changed, 3881 insertions(+)
 create mode 100644 src/test/regress/expected/brin_sort.out
 create mode 100644 src/test/regress/expected/brin_sort_exprs.out
 create mode 100644 src/test/regress/expected/brin_sort_multi.out
 create mode 100644 src/test/regress/expected/brin_sort_multi_exprs.out
 create mode 100644 src/test/regress/sql/brin_sort.sql
 create mode 100644 src/test/regress/sql/brin_sort_exprs.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi_exprs.sql

diff --git a/src/test/regress/expected/brin_sort.out b/src/test/regress/expected/brin_sort.out
new file mode 100644
index 0000000000..0e207cf4f6
--- /dev/null
+++ b/src/test/regress/expected/brin_sort.out
@@ -0,0 +1,543 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_exprs.out b/src/test/regress/expected/brin_sort_exprs.out
new file mode 100644
index 0000000000..87e58a1088
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_exprs.out
@@ -0,0 +1,788 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi.out b/src/test/regress/expected/brin_sort_multi.out
new file mode 100644
index 0000000000..22fa8331d3
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi.out
@@ -0,0 +1,545 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi_exprs.out b/src/test/regress/expected/brin_sort_multi_exprs.out
new file mode 100644
index 0000000000..0e9f4ea318
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi_exprs.out
@@ -0,0 +1,784 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index a930dfe48c..3db5f93dea 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -132,3 +132,9 @@ test: event_trigger oidjoins
 
 # this test also uses event triggers, so likewise run it by itself
 test: fast_default
+
+# try sorting using BRIN index
+test: brin_sort
+test: brin_sort_multi
+test: brin_sort_exprs
+test: brin_sort_multi_exprs
diff --git a/src/test/regress/sql/brin_sort.sql b/src/test/regress/sql/brin_sort.sql
new file mode 100644
index 0000000000..f4458bdc38
--- /dev/null
+++ b/src/test/regress/sql/brin_sort.sql
@@ -0,0 +1,238 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_exprs.sql b/src/test/regress/sql/brin_sort_exprs.sql
new file mode 100644
index 0000000000..de1f1e7d8f
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_exprs.sql
@@ -0,0 +1,373 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi.sql b/src/test/regress/sql/brin_sort_multi.sql
new file mode 100644
index 0000000000..d0544ad706
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi.sql
@@ -0,0 +1,235 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi_exprs.sql b/src/test/regress/sql/brin_sort_multi_exprs.sql
new file mode 100644
index 0000000000..299c797932
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi_exprs.sql
@@ -0,0 +1,369 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
-- 
2.39.0

0010-wip-test-generator-script-20230206.patchtext/x-patch; charset=UTF-8; name=0010-wip-test-generator-script-20230206.patchDownload

From 5de0ccd0d9515e97e5daa550eb8ca983066c8c70 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 6 Feb 2023 03:42:52 +0100
Subject: [PATCH 10/10] wip: test generator script

---
 brin-test.py | 386 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 386 insertions(+)
 create mode 100644 brin-test.py

diff --git a/brin-test.py b/brin-test.py
new file mode 100644
index 0000000000..c90ec798be
--- /dev/null
+++ b/brin-test.py
@@ -0,0 +1,386 @@
+import psycopg2
+import psycopg2.extras
+import random
+import sys
+import time
+import re
+
+from datetime import datetime
+from statistics import mean
+
+cols = [('int_val', 'int4_minmax_ops'),
+		('bigint_val', 'int8_minmax_ops'),
+		('text_val', 'text_minmax_ops'),
+		('inet_val', 'inet_minmax_ops'),
+		('(int_val+1)', 'int4_minmax_ops'),
+		('(bigint_val+1)', 'int8_minmax_ops'),
+		("('x' || text_val)", 'text_minmax_ops'),
+		('(inet_val + 1)', 'inet_minmax_ops'),
+		('(int_val+2)', 'int4_minmax_ops'),
+		('(bigint_val+2)', 'int8_minmax_ops'),
+		("('y' || text_val)", 'text_minmax_ops'),
+		('(inet_val + 2)', 'inet_minmax_ops')]
+
+# randomly reorder the table columns
+#table_cols = [('int_val int', 'i', 'i + %(skew)d * random()', 'i + 1000000 * random()'),
+#			  ('bigint_val bigint', '-i', '-i - 100 * random()', '-1 - 1000000 * random()'),
+#			  ('inet_val inet', "'10.0.0.0'::inet + i", "'10.0.0.0'::inet + i * 100 * random()::int", "'10.0.0.0'::inet + i + 1000000 * random()::int"),
+#			  ('text_val text', "lpad(i::text || md5(i::text), 40, '0')", "lpad((i + 100*random()::int)::text || md5(i::text), 40, '0')", "lpad((i + 1000000*random()::int)::text || md5(i::text), 40, '0')")]
+
+table_cols = [('int_val int', 'i + %(randomness)d * random()'),
+			  ('bigint_val bigint', '-i - %(randomness)d * random()'),
+			  ('inet_val inet', "'10.0.0.0'::inet + i + %(randomness)d * random()::int"),
+			  ('text_val text', "lpad((i + %(randomness)d * random()::int)::text || md5(i::text), 40, '0')")]
+
+
+def execute_query(cur, query, fetch_result = False):
+
+	cur.execute(query)
+
+	if fetch_result:
+		return cur.fetchall()
+
+
+# recreate the table with the columns in randomized order
+def recreate_table(conn, nrows, randomness, fillfactor):
+
+	random.shuffle(table_cols)
+
+	cur = conn.cursor()
+
+	execute_query(cur, 'BEGIN')
+
+	execute_query(cur, 'DROP TABLE IF EXISTS test_table')
+
+	execute_query(cur, 'CREATE TABLE test_table (%s) with (fillfactor=%d)' % (', '.join([v[0] for v in table_cols]), fillfactor))
+	print('CREATE TABLE test_table (%s) with (fillfactor=%d)' % (', '.join([v[0] for v in table_cols]), fillfactor))
+
+	insert_sql = 'INSERT INTO test_table SELECT %s FROM generate_series(1,%d) s(i)' % (', '.join([v[1] for v in table_cols]), nrows)
+	insert_sql = insert_sql % {'randomness' : int(nrows * randomness), 'rows' : nrows}
+
+	print(insert_sql)
+
+	execute_query(cur, insert_sql)
+
+	execute_query(cur, 'COMMIT')
+
+	cur.close()
+
+
+def create_indexes(conn, pages_per_range):
+
+	cur = conn.cursor()
+
+	num_indexes = random.randint(1,len(cols))
+
+	# randomly pick columns to index
+	indexed = random.sample(cols, num_indexes)
+
+	for c in indexed:
+		# f = random.random()
+		# num_pages = 1 + int(f * f * f * 256)
+		index_sql = 'CREATE INDEX ON test_table USING brin (%s %s) WITH (pages_per_range=%d)' % (c[0], c[1], pages_per_range)
+		print(index_sql)
+		execute_query(cur, index_sql)
+
+	cur.close()
+
+	return indexed
+
+
+def brinsort_in_explain(cur, query):
+
+	cur.execute('explain ' + query)
+	for r in cur.fetchall():
+		if 'BRIN Sort' in r['QUERY PLAN']:
+			return True
+
+	return False
+
+
+def compare_default(a, b):
+	if a < b:
+		return -1
+	elif a > b:
+		return 1
+	return 0
+
+
+def compare_inet(a, b):
+	a = [int(v) for v in a.split('.')]
+	b = [int(v) for v in b.split('.')]
+
+	for p in range(0,4):
+		r = compare_default(a[p], b[p])
+		if r != 0:
+			return r
+
+	return r
+
+
+def check_ordering(conn, config, query, expected_rows, select_star, select_list, sort_list, is_desc):
+
+	cur = conn.cursor()
+
+	data = execute_query(cur, query, True)
+
+	if len(data) != expected_rows:
+		print('ERROR: unexpected number of rows %s %s' % (expected_rows, len(data)))
+		sys.exit(1)
+
+	# what prefix we can check ordering for (some sort columns may not be
+	# included in the result, and we need a continuous prefix)
+	prefix = []
+	indexes = []
+	sort_order = {}
+	for s in sort_list:
+		if select_star:
+			if s[0] not in [x for x in table_cols]:
+				break
+
+			idx = [x for x in table_cols].index(s[0])
+		else:
+			if s not in select_list:
+				break
+
+			idx = select_list.index(s)
+
+		if idx is None:
+			break
+
+		sort_idx = sort_list.index(s)
+
+		prefix.append(s)
+		indexes.append(idx)
+		# sort_order.update({select_list.index(s) : is_desc[sort_list.index(s)]})
+		sort_order.update({idx : is_desc[sort_idx]})
+
+	# print("PREFIX", indexes, prefix)
+
+	if len(prefix) != 0:
+		prev = None
+		for row in data:
+			if prev is not None:
+
+				for idx in indexes:
+
+					if select_list[idx][1] == 'inet_minmax_ops':
+						r = compare_inet(prev[idx], row[idx])
+					else:
+						r = compare_default(prev[idx], row[idx])
+
+					if sort_order[idx]:
+						r = -r
+
+					if r > 0:
+						print("ERROR: incorrect ordering %s > %s" % (prev[idx], row[idx]))
+						sys.exit(1)
+
+					if r < 0:
+						break
+
+			prev = row
+
+	cur.close()
+
+
+def run_queries(conn, config, indexed_cols, num_queries = 1000):
+
+	nquery = 0
+
+	while nquery < num_queries:
+		if run_query(conn, config, indexed_cols):
+			nquery += 1
+
+
+def query_timing(cur, query):
+
+	runs = []
+
+	# get explain plan and costs from the first node
+	r = execute_query(cur, 'explain (analyze, timing off) %s' % (query,), True)
+	print("")
+	print("\n".join(['    ' + x[0] for x in r]))
+	print("")
+
+	sys.stdout.flush()
+
+	r = re.search('cost=([^\s]*)\.\.([^\s]*)', r[0][0])
+	costs = [float(r.groups()[0]), float(r.groups()[1])]
+
+	for r in range(0,1):
+		s = time.time()
+		execute_query(cur, query)
+		d = time.time()
+		runs.append(d-s)
+
+	# print("runs %s => mean %s" % (str(runs), mean(runs)))
+
+	sys.stdout.flush()
+
+	return (mean(runs), costs)
+
+
+def check_timing(conn, config, query):
+
+	cur = conn.cursor()
+
+	# get timing for a simple plan without a BRIN sort
+
+	execute_query(cur, 'set enable_seqscan = on')
+	execute_query(cur, 'set enable_brinsort = off')
+
+	(seqscan_time, seqscan_costs) = query_timing(cur, query)
+
+	# get timing for a simple plan with a BRIN sort
+	execute_query(cur, 'set enable_seqscan = off')
+	execute_query(cur, 'set enable_brinsort = on')
+	execute_query(cur, query)
+
+	(brinsort_time, brinsort_costs) = query_timing(cur, query)
+
+	print ("timing", 'rows', config['nrows'], 'pages_per_range', config['pages_per_range'], 'randomness', config['randomness'], 'fillfactor', config['fillfactor'], 'work_mem', config['work_mem'], 'watermark_step', config['watermark_step'], 'limit', config['limit'], 'offset', config['offset'], "seqscan", seqscan_time, "brinsort", brinsort_time, "costs seqscan", seqscan_costs[0], seqscan_costs[1], "brinsort", brinsort_costs[0], brinsort_costs[1])
+	# print ("brinsort timing", brinsort_time, "costs", brinsort_costs[0], brinsort_costs[1])
+
+	if (seqscan_costs[1] * 1.1 < brinsort_costs[1]) and (seqscan_time > brinsort_time * 1.1):
+		print ("COSTING ISSUE (%f < %f) && (%f > %f)" % (seqscan_costs[1], brinsort_costs[1], seqscan_time, brinsort_time))
+
+	if (seqscan_costs[1] > brinsort_costs[1] * 1.1) and (seqscan_time * 1.1 < brinsort_time):
+		print ("COSTING ISSUE (%f > %f) && (%f < %f)" % (seqscan_costs[1], brinsort_costs[1], seqscan_time, brinsort_time))
+
+	sys.stdout.flush()
+
+
+def run_query(conn, config, indexed_cols):
+
+	limit_rows = config['nrows']
+	offset_rows = 0
+
+	cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
+
+	# random columns to reference in the SELECT list, may not include sort column(s)
+	select_list = random.sample(cols, random.randint(1,len(cols)))
+
+	# but maybe just do select *, so that we don't do a projection
+	select_star = False
+	if random.random() < 0.5:
+		select_star = True
+		select_list = [('*', None)]
+
+	# random columns to reference in the ORDER BY clause
+	sort_list = random.sample(cols, random.randint(1,len(cols)))
+
+	# generate random ASC / DESC modifiers
+	is_desc = []
+	order_by = []
+	for s in range(0,len(sort_list)):
+		desc = random.choice([True, False])
+		is_desc.append(desc)
+		x = sort_list[s][0]
+		if desc:
+			x = x + ' DESC'
+		order_by.append(x)
+
+	query = 'SELECT %s FROM test_table ORDER BY %s' % (', '.join([v[0] for v in select_list]), ', '.join(order_by))
+
+	# randomly add LIMIT and OFFSET clause(s)
+	if random.random() < 0.5:
+
+		limit_rows = 1 + int(pow(random.random(), 3) * random.randint(1,config['nrows']))
+		query = query + ' LIMIT %d' % (limit_rows,);
+
+		if limit_rows < config['nrows'] and random.random() < 0.5:
+
+			offset_rows = int(pow(random.random(), 3) * random.randint(1,config['nrows'] - limit_rows))
+			query = query + ' OFFSET %d' % (offset_rows,);
+
+	expected_rows = min(limit_rows, config['nrows'] - offset_rows)
+
+	# watermark_step = random.randint(-1, 3)
+	watermark_step = random.choice([-1, 0, 1, 8, 32, 128])
+	execute_query(cur, 'SET brinsort_watermark_step = %d' % (watermark_step,))
+
+	f = random.random()
+	#work_mem_kb = 64 + int((f * f * f) * random.randint(64, 32768))
+	work_mem_kb = random.choice([64, 1024, 4096, 32768])
+
+	execute_query(cur, "SET work_mem = '%dkB'" % (work_mem_kb,))
+
+	config = config.copy()
+	config.update({'work_mem' : work_mem_kb})
+	config.update({'watermark_step' : watermark_step})
+	config.update({'limit' : limit_rows})
+	config.update({'offset' : offset_rows})
+
+	# do we expect brinsort or not? only when the first ORDER BY is indexed
+	if sort_list[0] in indexed_cols:
+
+		print('--------------', datetime.now(), '--------------')
+		print("SQL:", query)
+		print("CONFIG:", config)
+
+		if brinsort_in_explain(cur, query):
+			check_ordering(conn, config, query, expected_rows, select_star, select_list, sort_list, is_desc)
+			check_timing(conn, config, query)
+		else:
+			print("ERROR: BRIN Sort not in plan")
+			sys.exit(1)
+
+		result = True
+
+	else:
+
+		if brinsort_in_explain(cur, query):
+			print("ERROR: BRIN Sort in plan")
+			sys.exit(1)
+
+		result = False
+
+	cur.close()
+
+	return result
+
+
+def setup_connection(conn):
+	cur = conn.cursor()
+
+	# force index access
+	execute_query(cur, 'SET enable_seqscan = off')
+	execute_query(cur, 'SET max_parallel_workers_per_gather = 0')
+
+	cur.close()
+
+
+run_id = 0
+
+while True:
+
+	run_id += 1
+
+	config = {}
+
+	conn = psycopg2.connect('host=localhost port=5432 dbname=test user=user')
+
+	setup_connection(conn)
+
+	print('========== run %d ==========' % (run_id,))
+
+	# data distribution (1 - sequential, 3 - random)
+	config['randomness'] = random.choice([0, 0.05, 0.1, 0.25, 0.5, 1.0])
+
+	# random fillfactor, skewed closer to 10%
+	config['fillfactor'] = 10 + int(pow(random.random(),3) * 90)
+
+	# random number of rows
+	config['nrows'] = random.choice([100000, 1000000])
+
+	# pages per BRIN range (for all indexes)
+	config['pages_per_range'] = random.choice([1, 32, 128])
+
+	recreate_table(conn, config['nrows'], config['randomness'], config['fillfactor'])
+
+	indexed_cols = create_indexes(conn, config['pages_per_range'])
+
+	run_queries(conn, config, indexed_cols)
+
+	conn.close()
-- 
2.39.0

#25

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Tomas Vondra (#24)

10 attachment(s)

Re: PATCH: Using BRIN indexes for sorted output

Hi,

Rebased version of the patches, fixing only minor conflicts.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Allow-index-AMs-to-build-and-use-custom-sta-20230216.patchtext/x-patch; charset=UTF-8; name=0001-Allow-index-AMs-to-build-and-use-custom-sta-20230216.patchDownload

From dda8a2ede1aed1ab0ec6e8d86adfaafe8c27d691 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 17 Oct 2022 18:39:28 +0200
Subject: [PATCH 01/10] Allow index AMs to build and use custom statistics

Some indexing AMs work very differently and estimating them using
existing statistics is problematic, producing unreliable costing. This
applies e.g. to BRIN, which relies on page ranges, not tuple pointers.

This adds an optional AM procedure, allowing the opfamily to build
custom statistics, store them in pg_statistic and then use them during
planning. By default this is disabled, but may be enabled by setting

   SET enable_indexam_stats = true;

Then ANALYZE will call the optional procedure for all indexes.
---
 src/backend/access/brin/brin.c                |   1 +
 src/backend/access/brin/brin_minmax.c         | 823 ++++++++++++++++++
 src/backend/commands/analyze.c                | 135 ++-
 src/backend/statistics/extended_stats.c       |   2 +
 src/backend/utils/adt/selfuncs.c              |  59 ++
 src/backend/utils/cache/lsyscache.c           |  41 +
 src/backend/utils/misc/guc_tables.c           |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h                    |   2 +
 src/include/access/brin.h                     |  63 ++
 src/include/access/brin_internal.h            |   1 +
 src/include/catalog/pg_amproc.dat             |  64 ++
 src/include/catalog/pg_proc.dat               |   4 +
 src/include/catalog/pg_statistic.h            |   5 +
 src/include/commands/vacuum.h                 |   2 +
 src/include/utils/lsyscache.h                 |   1 +
 src/test/regress/expected/sysviews.out        |   3 +-
 17 files changed, 1211 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index de1427a1e0e..e8bf20a6bae 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -95,6 +95,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->amstrategies = 0;
 	amroutine->amsupport = BRIN_LAST_OPTIONAL_PROCNUM;
 	amroutine->amoptsprocnum = BRIN_PROCNUM_OPTIONS;
+	amroutine->amstatsprocnum = BRIN_PROCNUM_STATISTICS;
 	amroutine->amcanorder = false;
 	amroutine->amcanorderbyop = false;
 	amroutine->amcanbackward = false;
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 2431591be65..67e68a8b8ed 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -10,17 +10,22 @@
  */
 #include "postgres.h"
 
+#include "access/brin.h"
 #include "access/brin_internal.h"
+#include "access/brin_revmap.h"
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 typedef struct MinmaxOpaque
 {
@@ -253,6 +258,824 @@ brin_minmax_union(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/* FIXME copy of a private struct from brin.c */
+typedef struct BrinOpaque
+{
+	BlockNumber bo_pagesPerRange;
+	BrinRevmap *bo_rmAccess;
+	BrinDesc   *bo_bdesc;
+} BrinOpaque;
+
+/*
+ * Compare ranges by minval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_minval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->min_value, rb->min_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/*
+ * Compare ranges by maxval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_maxval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->max_value, rb->max_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/* compare values using an operator from typcache */
+static int
+range_values_cmp(const void *a, const void *b, void *arg)
+{
+	Datum	da = * (Datum *) a;
+	Datum	db = * (Datum *) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  da, db);
+	return DatumGetInt32(c);
+}
+
+/*
+ * minval_end
+ *		Determine first index so that (minval > value).
+ *
+ * The array of ranges is expected to be sorted by minvalue, so this is the first
+ * range that can't possibly intersect with a range having "value" as maxval.
+ */
+static int
+minval_end(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[end]->min_value, typcache) >= 0)
+		return nranges;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[start]->min_value, typcache) < 0)
+		return 0;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->min_value, typcache);
+
+		if (r >= 0)
+			start = midpoint + 1;
+		else
+			end = midpoint;
+	}
+
+	Assert(ranges[start]->min_value > value);
+	Assert(ranges[start-1]->min_value <= value);
+
+	return start;
+}
+
+
+/*
+ * lower_bound
+ *		Determine first index so that (values[index] >= value).
+ *
+ * The array of ranges is expected to be sorted by maxvalue, so this is the first
+ * range that can possibly intersect with range having "value" as minval.
+ */
+static int
+lower_bound(Datum *values, int nvalues, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &values[start], typcache) <= 0)
+		return 0;
+
+	// no matches
+	if (range_values_cmp(&value, &values[end], typcache) > 0)
+		return nvalues;
+
+	while ((end - start) > 0)
+	{
+		int	midpoint;
+		int	r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &values[midpoint], typcache);
+
+		if (r <= 0)
+			end = midpoint;
+		else
+			start = (midpoint + 1);
+	}
+
+	Assert(values[start] >= value);
+	Assert(values[start-1] < value);
+
+	return start;
+}
+
+/*
+ * upper_bound
+ *		Determine first index so that (values[index] > value).
+ *
+ * The array of ranges is expected to be sorted by minvalue, so this is the first
+ * range that can't possibly intersect with a range having "value" as maxval.
+ */
+static int
+upper_bound(Datum *values, int nvalues, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &values[end], typcache) >= 0)
+		return nvalues;
+
+	// no matches
+	if (range_values_cmp(&value, &values[start], typcache) < 0)
+		return 0;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &values[midpoint], typcache);
+
+		if (r >= 0)
+			start = midpoint + 1;
+		else
+			end = midpoint;
+	}
+
+	Assert(values[start] > value);
+	Assert(values[start-1] <= value);
+
+	return start;
+}
+
+/*
+ * brin_minmax_count_overlaps
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges to quickly eliminate ranges that can't possibly
+ * intersect. We simply walk minranges until minval > current maxval, and
+ * we're done.
+ *
+ * Unlike brin_minmax_count_overlaps2, this does not have issues with wide
+ * ranges, so this is what we should use.
+ */
+static void
+brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
+						   TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	int64	noverlaps;
+
+	noverlaps = 0;
+	for (int i = 0; i < nranges; i++)
+	{
+		Datum	maxval = minranges[i]->max_value;
+
+		/*
+		 * Determine index of the first range with (minval > current maxval)
+		 * by binary search. We know all other ranges can't overlap the
+		 * current one. We simply subtract indexes to count ranges.
+		 */
+		int		idx = minval_end(minranges, nranges, maxval, typcache);
+
+		/* -1 because we don't count the range as intersecting with itself */
+		noverlaps += (idx - i - 1);
+	}
+
+	/*
+	 * We only count 1/2 the ranges (minval > current minval), so the total
+	 * number of overlaps is twice what we counted.
+	 */
+	noverlaps *= 2;
+
+	stats->avg_overlaps = (double) noverlaps / nranges;
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Alternative to brin_minmax_match_tuples_to_ranges2, leveraging ordering
+ * of values, not ranges.
+ *
+ * XXX This seems like the optimal way to do this.
+ */
+static void
+brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
+								   int numrows, HeapTuple *rows,
+								   int nvalues, Datum *values,
+								   TypeCacheEntry *typcache,
+								   BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nvalues_unique = 0;
+	int64	nmatches_value = 0;
+
+	int64  *unique = (int64 *) palloc0(sizeof(int64) * nvalues);
+
+	/*
+	 * Build running count of unique values. We know there are unique[i]
+	 * unique values in values array up to index "i".
+	 */
+	unique[0] = 1;
+	for (int i = 1; i < nvalues; i++)
+	{
+		if (range_values_cmp(&values[i-1], &values[i], typcache) == 0)
+			unique[i] = unique[i-1];
+		else
+			unique[i] = unique[i-1] + 1;
+	}
+
+	nvalues_unique = unique[nvalues-1];
+
+	/*
+	 * Walk the ranges, for each range determine the first/last mapping
+	 * value. Use the "unique" array to count the unique values.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int		start;
+		int		end;
+
+		CHECK_FOR_INTERRUPTS();
+
+		start = lower_bound(values, nvalues, ranges->ranges[i].min_value, typcache);
+		end = upper_bound(values, nvalues, ranges->ranges[i].max_value, typcache);
+
+		/* if nothing matches (e.g. end=0), skip this range */
+		if (end <= start)
+			continue;
+
+		nmatches_value = (end - start);
+		nmatches_unique += (unique[end-1] - unique[start] + 1);
+
+		Assert((nmatches_value >= 1) && (nmatches_value <= nvalues));
+		Assert((nmatches_unique >= 1) && (nmatches_unique <= unique[nvalues-1]));
+
+		nmatches += nmatches_value;
+	}
+
+	Assert(nmatches >= 0);
+	Assert(nmatches_unique >= 0);
+
+	stats->avg_matches = (double) nmatches / numrows;
+	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
+}
+
+/*
+ * brin_minmax_value_stats
+ *		Calculate statistics about minval/maxval values.
+ *
+ * We calculate the number of distinct values, and also correlation with respect
+ * to blkno_start. We don't calculate the regular correlation coefficient, because
+ * our goal is to estimate how sequential the accesses are. The regular correlation
+ * would produce 0 for cyclical data sets like mod(i,1000000), but it may be quite
+ * sequantial access. Maybe it should be called differently, not correlation?
+ *
+ * XXX Maybe this should calculate minval vs. maxval correlation too?
+ *
+ * XXX I don't know how important the sequentiality is - BRIN generally uses 1MB
+ * page ranges, which is pretty sequential and the one random seek in between is
+ * likely going to be negligible. Maybe for small page ranges it'll matter, though.
+ */
+static void
+brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
+						int nranges, TypeCacheEntry *typcache,
+						BrinMinmaxStats *stats)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1,
+			minval_corr = 0,
+			maxval_corr = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+			minval_ndist++;
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+			maxval_ndist++;
+
+		/* is it immediately sequential? */
+		if (minranges[i-1]->blkno_end + 1 == minranges[i]->blkno_start)
+			minval_corr++;
+
+		/* is it immediately sequential? */
+		if (maxranges[i-1]->blkno_end + 1 == maxranges[i]->blkno_start)
+			maxval_corr++;
+	}
+
+	stats->minval_ndistinct = minval_ndist;
+	stats->maxval_ndistinct = maxval_ndist;
+
+	stats->minval_correlation = (double) minval_corr / nranges;
+	stats->maxval_correlation = (double) maxval_corr / nranges;
+}
+
+/*
+ * brin_minmax_increment_stats
+ *		Calculate the increment size for minval/maxval steps.
+ *
+ * Calculates the minval/maxval increment size, i.e. number of rows that need
+ * to be added to the sort. This serves as an input to calculation of a good
+ * watermark step.
+ */
+static void
+brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
+							int nranges, Datum *values, int nvalues,
+							TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1;
+
+	double	sum_minval = 0,
+			sum_maxval = 0,
+			max_minval = 0,
+			max_maxval = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+		{
+			double	p;
+			int		start = upper_bound(values, nvalues, minranges[i-1]->min_value, typcache);
+			int		end = upper_bound(values, nvalues, minranges[i]->min_value, typcache);
+
+			/*
+			 * Maybe there are no matching rows, but we still need to count
+			 * this as distinct minval (even though the sample increase is 0).
+			 */
+			minval_ndist++;
+
+			Assert(end >= start);
+
+			/* no sample rows match this, so skip */
+			if (end == start)
+				continue;
+
+			p = (double) (end - start) / nvalues;
+
+			max_minval = Max(max_minval, p);
+			sum_minval += p;
+		}
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+		{
+			double	p;
+			int		start = upper_bound(values, nvalues, maxranges[i-1]->max_value, typcache);
+			int		end = upper_bound(values, nvalues, maxranges[i]->max_value, typcache);
+
+			/*
+			 * Maybe there are no matching rows, but we still need to count
+			 * this as distinct maxval (even though the sample increase is 0).
+			 */
+			maxval_ndist++;
+
+			Assert(end >= start);
+
+			/* no sample rows match this, so skip */
+			if (end == start)
+				continue;
+
+			p = (double) (end - start) / nvalues;
+
+			max_maxval = Max(max_maxval, p);
+			sum_maxval += p;
+		}
+	}
+
+	stats->minval_increment_avg = (sum_minval / minval_ndist);
+	stats->minval_increment_max = max_minval;
+
+	stats->maxval_increment_avg = (sum_maxval / maxval_ndist);
+	stats->maxval_increment_max = max_maxval;
+}
+
+/*
+ * brin_minmax_stats
+ *		Calculate custom statistics for a BRIN minmax index.
+ *
+ * At the moment this calculates:
+ *
+ *  - number of summarized/not-summarized and all/has nulls ranges
+ *  - average number of overlaps for a range
+ *  - average number of rows matching a range
+ *  - number of distinct minval/maxval values
+ *
+ * XXX This could also calculate correlation of the range minval, so that
+ * we can estimate how much random I/O will happen during the BrinSort.
+ * And perhaps we should also sort the ranges by (minval,block_start) to
+ * make this as sequential as possible?
+ *
+ * XXX Another interesting statistics might be the number of ranges with
+ * the same minval (or number of distinct minval values), because that's
+ * essentially what we need to estimate how many ranges will be read in
+ * one brinsort step. In fact, knowing the number of distinct minval
+ * values tells us the number of BrinSort loops.
+ *
+ * XXX We might also calculate a histogram of minval/maxval values.
+ *
+ * XXX I wonder if we could track for each range track probabilities:
+ *
+ * - P1 = P(v <= minval)
+ * - P2 = P(x <= Max(maxval)) for Max(maxval) over preceding ranges
+ *
+ * That would allow us to estimate how many ranges we'll have to read to produce
+ * a particular number of rows, because we need the first probability to exceed
+ * the requested number of rows (fraction of the table):
+ *
+ *     (limit rows / reltuples) <= P(v <= minval)
+ *
+ * and then the second probability would say how many rows we'll process (either
+ * sort or spill). And inversely for the DESC ordering.
+ *
+ * The difference between P1 for two ranges is how much we'd have to sort
+ * if we moved the watermark between the ranges (first minval to second one).
+ * The (P2 - P1) for the new watermark range measures the number of rows in
+ * the tuplestore. We'll need to aggregate this, though, we can't keep the
+ * whole data - probably average/median/max for the differences would be nice.
+ * Might be tricky for different watermark step values, though.
+ *
+ * This would also allow estimating how many rows will spill from each range,
+ * because we have an estimate how many rows match a range on average, and
+ * we can compare it to the difference between P1.
+ *
+ * One issue is we don't have actual tuples from the ranges, so we can't
+ * measure exactly how many rows would we add. But we can match the sample
+ * and at least estimate the the probability difference.
+ *
+ * Actually - we do know the tuples *are* in those ranges, because if we
+ * assume the tuple is in some other range, that range would have to have
+ * a minimal/maximal value so that the value is consistent. Which means
+ * the range has to be between those ranges. Of course, this only estimates
+ * the rows we'd going to add to the tuplesort - there might be more rows
+ * we read and spill to tuplestore, but that's something we can estimate
+ * using average tuples per range.
+ */
+Datum
+brin_minmax_stats(PG_FUNCTION_ARGS)
+{
+	Relation		heapRel = (Relation) PG_GETARG_POINTER(0);
+	Relation		indexRel = (Relation) PG_GETARG_POINTER(1);
+	AttrNumber		attnum = PG_GETARG_INT16(2);
+	AttrNumber		heap_attnum = PG_GETARG_INT16(3);
+	HeapTuple	   *rows = (HeapTuple *) PG_GETARG_POINTER(4);
+	int				numrows = PG_GETARG_INT32(5);
+
+	BrinOpaque *opaque;
+	BlockNumber nblocks;
+	BlockNumber	nranges;
+	BlockNumber	heapBlk;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BrinRanges  *ranges;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinMinmaxStats *stats;
+
+	Oid				typoid;
+	TypeCacheEntry *typcache;
+	BrinRange	  **minranges,
+				  **maxranges;
+	int64			prev_min_index;
+
+	/*
+	 * Mostly what brinbeginscan does to initialize BrinOpaque, except that
+	 * we use active snapshot instead of the scan snapshot.
+	 */
+	opaque = palloc_object(BrinOpaque);
+	opaque->bo_rmAccess = brinRevmapInitialize(indexRel,
+											   &opaque->bo_pagesPerRange,
+											   GetActiveSnapshot());
+	opaque->bo_bdesc = brin_build_desc(indexRel);
+
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+
+	/*
+	 * How many ranges can there be? We simply look at the number of pages,
+	 * divide it by the pages_per_range.
+	 *
+	 * XXX We need to be careful not to overflow nranges, so we just divide
+	 * and then maybe add 1 for partial ranges.
+	 */
+	nranges = (nblocks / pagesPerRange);
+	if (nblocks % pagesPerRange != 0)
+		nranges += 1;
+
+	/* allocate for space, and also for the alternative ordering */
+	ranges = palloc0(offsetof(BrinRanges, ranges) + nranges * sizeof(BrinRange));
+	ranges->nranges = 0;
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* result stats */
+	stats = palloc0(sizeof(BrinMinmaxStats));
+	SET_VARSIZE(stats, sizeof(BrinMinmaxStats));
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX We count the ranges, and count the special types (not summarized,
+	 * all-null and has-null). The regular ranges are accumulated into an
+	 * array, so that we can calculate additional statistics (overlaps, hits
+	 * for sample tuples, etc).
+	 *
+	 * XXX This needs rethinking to make it work with large indexes with more
+	 * ranges than we can fit into memory (work_mem/maintenance_work_mem).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		stats->n_ranges++;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   GetActiveSnapshot());
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/* Ranges with no indexed tuple are ignored for overlap analysis. */
+		if (!gottuple)
+		{
+			continue;
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/* Placeholders can be ignored too, as if not summarized. */
+				continue;
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				/* OK this range is summarized */
+				stats->n_summarized++;
+
+				if (bval->bv_allnulls)
+					stats->n_all_nulls++;
+
+				if (bval->bv_hasnulls)
+					stats->n_has_nulls++;
+
+				if (!bval->bv_allnulls)
+				{
+					BrinRange  *range;
+
+					range = &ranges->ranges[ranges->nranges++];
+
+					range->blkno_start = heapBlk;
+					range->blkno_end = heapBlk + (pagesPerRange - 1);
+
+					range->min_value = bval->bv_values[0];
+					range->max_value = bval->bv_values[1];
+				}
+			}
+		}
+	}
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* if we have no regular ranges, we're done */
+	if (ranges->nranges == 0)
+		goto cleanup;
+
+	/*
+	 * Build auxiliary info to optimize the calculation.
+	 *
+	 * We have ranges in the blocknum order, but that is not very useful when
+	 * calculating which ranges interstect - we could cross-check every range
+	 * against every other range, but that's O(N^2) and thus may get extremely
+	 * expensive pretty quick).
+	 *
+	 * To make that cheaper, we'll build two orderings, allowing us to quickly
+	 * eliminate ranges that can't possibly overlap:
+	 *
+	 * - minranges = ranges ordered by min_value
+	 * - maxranges = ranges ordered by max_value
+	 *
+	 * To count intersections, we'll then walk maxranges (i.e. ranges ordered
+	 * by maxval), and for each following range we'll check if it overlaps.
+	 * If yes, we'll proceed to the next one, until we find a range that does
+	 * not overlap. But there might be a later page overlapping - but we can
+	 * use a min_index_lowest tracking the minimum min_index for "future"
+	 * ranges to quickly decide if there are such ranges. If there are none,
+	 * we can terminate (and proceed to the next maxranges element), else we
+	 * have to process additional ranges.
+	 *
+	 * Note: This only counts overlaps with ranges with max_value higher than
+	 * the current one - we want to count all, but the overlaps with preceding
+	 * ranges have already been counted when processing those preceding ranges.
+	 * That is, we'll end up with counting each overlap just for one of those
+	 * ranges, so we get only 1/2 the count.
+	 *
+	 * Note: We don't count the range as overlapping with itself. This needs
+	 * to be considered later, when applying the statistics.
+	 *
+	 *
+	 * XXX This will not work for very many ranges - we can have up to 2^32 of
+	 * them, so allocating a ~32B struct for each would need a lot of memory.
+	 * Not sure what to do about that, perhaps we could sample a couple ranges
+	 * and do some calculations based on that? That is, we could process all
+	 * ranges up to some number (say, statistics_target * 300, as for rows), and
+	 * then sample ranges for larger tables. Then sort the sampled ranges, and
+	 * walk through all ranges once, comparing them to the sample and counting
+	 * overlaps (having them sorted should allow making this quite efficient,
+	 * I think - following algorithm similar to the one implemented here).
+	 */
+
+	/* info about ordering for the data type */
+	typoid = get_atttype(RelationGetRelid(indexRel), attnum);
+	typcache = lookup_type_cache(typoid, TYPECACHE_CMP_PROC_FINFO);
+
+	/* shouldn't happen, I think - we use this to build the index */
+	Assert(OidIsValid(typcache->cmp_proc_finfo.fn_oid));
+
+	minranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+	maxranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+
+	/*
+	 * Build and sort the ranges min_value / max_value (just pointers
+	 * to the main array). Then go and assign the min_index to each
+	 * range, and finally walk the maxranges array backwards and track
+	 * the min_index_lowest as minimum of "future" indexes.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		minranges[i] = &ranges->ranges[i];
+		maxranges[i] = &ranges->ranges[i];
+	}
+
+	qsort_arg(minranges, ranges->nranges, sizeof(BrinRange *),
+			  range_minval_cmp, typcache);
+
+	qsort_arg(maxranges, ranges->nranges, sizeof(BrinRange *),
+			  range_maxval_cmp, typcache);
+
+	/*
+	 * Update the min_index for each range. If the values are equal, be sure to
+	 * pick the lowest index with that min_value.
+	 */
+	minranges[0]->min_index = 0;
+	for (int i = 1; i < ranges->nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i]->min_value, &minranges[i-1]->min_value, typcache) == 0)
+			minranges[i]->min_index = minranges[i-1]->min_index;
+		else
+			minranges[i]->min_index = i;
+	}
+
+	/*
+	 * Walk the maxranges backward and assign the min_index_lowest as
+	 * a running minimum.
+	 */
+	prev_min_index = ranges->nranges;
+	for (int i = (ranges->nranges - 1); i >= 0; i--)
+	{
+		maxranges[i]->min_index_lowest = Min(maxranges[i]->min_index,
+											 prev_min_index);
+		prev_min_index = maxranges[i]->min_index_lowest;
+	}
+
+	/* calculate average number of overlapping ranges for any range */
+	brin_minmax_count_overlaps(minranges, ranges->nranges, typcache, stats);
+
+	/* calculate minval/maxval stats (distinct values and correlation) */
+	brin_minmax_value_stats(minranges, maxranges,
+							ranges->nranges, typcache, stats);
+
+	/* match tuples to ranges */
+	{
+		int		nvalues = 0;
+		Datum  *values = (Datum *) palloc0(numrows * sizeof(Datum));
+
+		TupleDesc	tdesc = RelationGetDescr(heapRel);
+
+		for (int i = 0; i < numrows; i++)
+		{
+			bool	isnull;
+			Datum	value;
+
+			value = heap_getattr(rows[i], heap_attnum, tdesc, &isnull);
+			if (!isnull)
+				values[nvalues++] = value;
+		}
+
+		qsort_arg(values, nvalues, sizeof(Datum), range_values_cmp, typcache);
+
+		/* optimized algorithm */
+		brin_minmax_match_tuples_to_ranges(ranges,
+										   numrows, rows, nvalues, values,
+										   typcache, stats);
+
+		brin_minmax_increment_stats(minranges, maxranges, ranges->nranges,
+									values, nvalues, typcache, stats);
+	}
+
+	/*
+	 * Possibly quite large, so release explicitly and don't rely
+	 * on the memory context to discard this.
+	 */
+	pfree(minranges);
+	pfree(maxranges);
+
+cleanup:
+	/* possibly quite large, so release explicitly */
+	pfree(ranges);
+
+	/* free the BrinOpaque, just like brinendscan() would */
+	brinRevmapTerminate(opaque->bo_rmAccess);
+	brin_free_desc(opaque->bo_bdesc);
+
+	PG_RETURN_POINTER(stats);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 65750958bb2..f8347316e97 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -16,6 +16,7 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
 #include "access/detoast.h"
 #include "access/genam.h"
 #include "access/multixact.h"
@@ -30,6 +31,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_namespace.h"
@@ -81,6 +83,7 @@ typedef struct AnlIndexData
 
 /* Default statistics target (GUC parameter) */
 int			default_statistics_target = 100;
+bool		enable_indexam_stats = false;
 
 /* A few variables that don't seem worth passing around as parameters */
 static MemoryContext anl_context = NULL;
@@ -92,7 +95,7 @@ static void do_analyze_rel(Relation onerel,
 						   AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
 						   bool inh, bool in_outer_xact, int elevel);
 static void compute_index_stats(Relation onerel, double totalrows,
-								AnlIndexData *indexdata, int nindexes,
+								AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 								HeapTuple *rows, int numrows,
 								MemoryContext col_context);
 static VacAttrStats *examine_attribute(Relation onerel, int attnum,
@@ -453,15 +456,49 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		{
 			AnlIndexData *thisdata = &indexdata[ind];
 			IndexInfo  *indexInfo;
+			bool		collectAmStats;
+			Oid			regproc;
 
 			thisdata->indexInfo = indexInfo = BuildIndexInfo(Irel[ind]);
 			thisdata->tupleFract = 1.0; /* fix later if partial */
-			if (indexInfo->ii_Expressions != NIL && va_cols == NIL)
+
+			/*
+			 * Should we collect AM-specific statistics for any of the columns?
+			 *
+			 * If AM-specific statistics are enabled (using a GUC), see if we
+			 * have an optional support procedure to build the statistics.
+			 *
+			 * If there's any such attribute, we just force building stats
+			 * even for regular index keys (not just expressions) and indexes
+			 * without predicates. It'd be good to only build the AM stats, but
+			 * for now this is good enough.
+			 *
+			 * XXX The GUC is there morestly to make it easier to enable/disable
+			 * this during development.
+			 *
+			 * FIXME Only build the AM statistics, not the other stats. And only
+			 * do that for the keys with the optional procedure. not all of them.
+			 */
+			collectAmStats = false;
+			if (enable_indexam_stats && (Irel[ind]->rd_indam->amstatsprocnum != 0))
+			{
+				for (int j = 0; j < indexInfo->ii_NumIndexAttrs; j++)
+				{
+					regproc = index_getprocid(Irel[ind], (j+1), Irel[ind]->rd_indam->amstatsprocnum);
+					if (OidIsValid(regproc))
+					{
+						collectAmStats = true;
+						break;
+					}
+				}
+			}
+
+			if ((indexInfo->ii_Expressions != NIL || collectAmStats) && va_cols == NIL)
 			{
 				ListCell   *indexpr_item = list_head(indexInfo->ii_Expressions);
 
 				thisdata->vacattrstats = (VacAttrStats **)
-					palloc(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
+					palloc0(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
 				tcnt = 0;
 				for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
 				{
@@ -482,6 +519,12 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 						if (thisdata->vacattrstats[tcnt] != NULL)
 							tcnt++;
 					}
+					else
+					{
+						thisdata->vacattrstats[tcnt] =
+							examine_attribute(Irel[ind], i + 1, NULL);
+						tcnt++;
+					}
 				}
 				thisdata->attr_cnt = tcnt;
 			}
@@ -587,7 +630,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 		if (nindexes > 0)
 			compute_index_stats(onerel, totalrows,
-								indexdata, nindexes,
+								indexdata, Irel, nindexes,
 								rows, numrows,
 								col_context);
 
@@ -821,12 +864,79 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	anl_context = NULL;
 }
 
+/*
+ * compute_indexam_stats
+ *		Call the optional procedure to compute AM-specific statistics.
+ *
+ * We simply call the procedure, which is expected to produce a bytea value.
+ *
+ * At the moment this only deals with BRIN indexes, and bails out for other
+ * access methods, but it should be generic - use something like amoptsprocnum
+ * and just check if the procedure exists.
+ */
+static void
+compute_indexam_stats(Relation onerel,
+					  Relation indexRel, IndexInfo *indexInfo,
+					  double totalrows, AnlIndexData *indexdata,
+					  HeapTuple *rows, int numrows)
+{
+	if (!enable_indexam_stats)
+		return;
+
+	/* ignore index AMs without the optional procedure */
+	if (indexRel->rd_indam->amstatsprocnum == 0)
+		return;
+
+	/*
+	 * Look at attributes, and calculate stats for those that have the
+	 * optional stats proc for the opfamily.
+	 */
+	for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+	{
+		AttrNumber		attno = (i + 1);
+		AttrNumber		attnum = indexInfo->ii_IndexAttrNumbers[i];	/* heap attnum */
+		RegProcedure	regproc;
+		FmgrInfo	   *statsproc;
+		Datum			datum;
+		VacAttrStats   *stats;
+		MemoryContext	oldcxt;
+
+		/* do this first, as it doesn't fail when proc not defined */
+		regproc = index_getprocid(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+
+		/* ignore opclasses without the optional procedure */
+		if (!RegProcedureIsValid(regproc))
+			continue;
+
+		statsproc = index_getprocinfo(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+		Assert(statsproc != NULL);
+
+		stats = indexdata->vacattrstats[i];
+
+		oldcxt = MemoryContextSwitchTo(stats->anl_context);
+
+		/* call the proc, let the AM calculate whatever it wants */
+		datum = FunctionCall6Coll(statsproc,
+								  InvalidOid, /* FIXME correct collation */
+								  PointerGetDatum(onerel),
+								  PointerGetDatum(indexRel),
+								  Int16GetDatum(attno),
+								  Int16GetDatum(attnum),
+								  PointerGetDatum(rows),
+								  Int32GetDatum(numrows));
+
+		stats->staindexam = datum;
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+}
+
 /*
  * Compute statistics about indexes of a relation
  */
 static void
 compute_index_stats(Relation onerel, double totalrows,
-					AnlIndexData *indexdata, int nindexes,
+					AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 					HeapTuple *rows, int numrows,
 					MemoryContext col_context)
 {
@@ -846,6 +956,7 @@ compute_index_stats(Relation onerel, double totalrows,
 	{
 		AnlIndexData *thisdata = &indexdata[ind];
 		IndexInfo  *indexInfo = thisdata->indexInfo;
+		Relation	indexRel = indexRels[ind];
 		int			attr_cnt = thisdata->attr_cnt;
 		TupleTableSlot *slot;
 		EState	   *estate;
@@ -858,6 +969,13 @@ compute_index_stats(Relation onerel, double totalrows,
 					rowno;
 		double		totalindexrows;
 
+		/*
+		 * If this is a BRIN index, try calling a procedure to collect
+		 * extra opfamily-specific statistics (if procedure defined).
+		 */
+		compute_indexam_stats(onerel, indexRel, indexInfo, totalrows,
+							  thisdata, rows, numrows);
+
 		/* Ignore index if no columns to analyze and not partial */
 		if (attr_cnt == 0 && indexInfo->ii_Predicate == NIL)
 			continue;
@@ -1661,6 +1779,13 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+
+		/* optional AM-specific stats */
+		if (DatumGetPointer(stats->staindexam) != NULL)
+			values[Anum_pg_statistic_staindexam - 1] = stats->staindexam;
+		else
+			nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 572d9b44643..97fee77ea57 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -2370,6 +2370,8 @@ serialize_expr_stats(AnlExprData *exprdata, int nexprs)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+		nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index fe37e65af03..cc2f3ef012a 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7834,6 +7834,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	Relation	indexRel;
 	ListCell   *l;
 	VariableStatData vardata;
+	double		averageOverlaps;
 
 	Assert(rte->rtekind == RTE_RELATION);
 
@@ -7881,6 +7882,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 * correlation statistics, we will keep it as 0.
 	 */
 	*indexCorrelation = 0;
+	averageOverlaps = 0.0;
 
 	foreach(l, path->indexclauses)
 	{
@@ -7890,6 +7892,36 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		/* attempt to lookup stats in relation for this index column */
 		if (attnum != 0)
 		{
+			/*
+			 * If AM-specific statistics are enabled, try looking up the stats
+			 * for the index key. We only have this for minmax opclasses, so
+			 * we just cast it like that. But other BRIN opclasses might need
+			 * other stats so either we need to abstract this somehow, or maybe
+			 * just collect a sufficiently generic stats for all BRIN indexes.
+			 *
+			 * XXX Make this non-minmax specific.
+			 */
+			if (enable_indexam_stats)
+			{
+				BrinMinmaxStats  *amstats
+					= (BrinMinmaxStats *) get_attindexam(index->indexoid, attnum);
+
+				if (amstats)
+				{
+					elog(DEBUG1, "found AM stats: attnum %d n_ranges %lld n_summarized %lld n_all_nulls %lld n_has_nulls %lld avg_overlaps %f",
+						 attnum, (long long)amstats->n_ranges, (long long)amstats->n_summarized,
+						 (long long)amstats->n_all_nulls, (long long)amstats->n_has_nulls,
+						 amstats->avg_overlaps);
+
+					/*
+					 * The only thing we use at the moment is the average number
+					 * of overlaps for a single range. Use the other stuff too.
+					 */
+					averageOverlaps = Max(averageOverlaps,
+										  1.0 + amstats->avg_overlaps);
+				}
+			}
+
 			/* Simple variable -- look to stats for the underlying table */
 			if (get_relation_stats_hook &&
 				(*get_relation_stats_hook) (root, rte, attnum, &vardata))
@@ -7970,6 +8002,14 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 											 baserel->relid,
 											 JOIN_INNER, NULL);
 
+	/*
+	 * XXX Can we combine qualSelectivity with the average number of matching
+	 * ranges per value? qualSelectivity estimates how many tuples ar we
+	 * going to match, and average number of matches says how many ranges
+	 * will each of those match on average. We don't know how many will
+	 * be duplicate, but it gives us a worst-case estimate, at least.
+	 */
+
 	/*
 	 * Now calculate the minimum possible ranges we could match with if all of
 	 * the rows were in the perfect order in the table's heap.
@@ -7986,6 +8026,25 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	else
 		estimatedRanges = Min(minimalRanges / *indexCorrelation, indexRanges);
 
+	elog(DEBUG1, "before index AM stats: cestimatedRanges = %f", estimatedRanges);
+
+	/*
+	 * If we found some AM stats, look at average number of overlapping ranges,
+	 * and apply that to the currently estimated ranges.
+	 *
+	 * XXX We pretty much combine this with correlation info (because it was
+	 * already applied in the estimatedRanges formula above), which might be
+	 * overly pessimistic. The overlaps stats seems somewhat redundant with
+	 * the correlation, so maybe we should do just one? The AM stats seems
+	 * like a more reliable information, because the correlation is not very
+	 * sensitive to outliers, for example. So maybe let's prefer that, and
+	 * only use the correlation as fallback when AM stats are not available?
+	 */
+	if (averageOverlaps > 0.0)
+		estimatedRanges = Min(estimatedRanges * averageOverlaps, indexRanges);
+
+	elog(DEBUG1, "after index AM stats: cestimatedRanges = %f", estimatedRanges);
+
 	/* we expect to visit this portion of the table */
 	selec = estimatedRanges / indexRanges;
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c07382051d6..e41aabdeae0 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -3138,6 +3138,47 @@ get_attavgwidth(Oid relid, AttrNumber attnum)
 	return 0;
 }
 
+
+/*
+ * get_attstaindexam
+ *
+ *	  Given the table and attribute number of a column, get the index AM
+ *	  statistics.  Return NULL if no data available.
+ *
+ * Currently this is only consulted for individual tables, not for inheritance
+ * trees, so we don't need an "inh" parameter.
+ */
+bytea *
+get_attindexam(Oid relid, AttrNumber attnum)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache3(STATRELATTINH,
+						 ObjectIdGetDatum(relid),
+						 Int16GetDatum(attnum),
+						 BoolGetDatum(false));
+	if (HeapTupleIsValid(tp))
+	{
+		Datum	val;
+		bytea  *retval = NULL;
+		bool	isnull;
+
+		val = SysCacheGetAttr(STATRELATTINH, tp,
+							  Anum_pg_statistic_staindexam,
+							  &isnull);
+
+		if (!isnull)
+			retval = (bytea *) PG_DETOAST_DATUM(val);
+
+		// staindexam = ((Form_pg_statistic) GETSTRUCT(tp))->staindexam;
+		ReleaseSysCache(tp);
+
+		return retval;
+	}
+
+	return NULL;
+}
+
 /*
  * get_attstatsslot
  *
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 43b9d926600..67687d158e6 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1002,6 +1002,16 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexam_stats", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index AM stats."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_indexam_stats,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d06074b86f6..47e80ad150c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -375,6 +375,7 @@
 #enable_hashagg = on
 #enable_hashjoin = on
 #enable_incremental_sort = on
+#enable_indexam_stats = off
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 4f1f67b4d03..e3eab725ae5 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -216,6 +216,8 @@ typedef struct IndexAmRoutine
 	uint16		amsupport;
 	/* opclass options support function number or 0 */
 	uint16		amoptsprocnum;
+	/* opclass statistics support function number or 0 */
+	uint16		amstatsprocnum;
 	/* does AM support ORDER BY indexed column's value? */
 	bool		amcanorder;
 	/* does AM support ORDER BY result of an operator on indexed column? */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index ed66f1b3d51..1d21b816fcd 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,6 +34,69 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinMinmaxStats
+{
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	int64		n_ranges;
+	int64		n_summarized;
+	int64		n_all_nulls;
+	int64		n_has_nulls;
+
+	/* average number of overlapping ranges */
+	double		avg_overlaps;
+
+	/* average number of matching ranges (per value) */
+	double		avg_matches;
+	double		avg_matches_unique;
+
+	/* minval/maxval stats (ndistinct, correlation to blkno) */
+	int64		minval_ndistinct;
+	int64		maxval_ndistinct;
+	double		minval_correlation;
+	double		maxval_correlation;
+
+	/* minval/maxval increment stats */
+	double		minval_increment_avg;
+	double		minval_increment_max;
+	double		maxval_increment_avg;
+	double		maxval_increment_max;
+
+} BrinMinmaxStats;
 
 #define BRIN_DEFAULT_PAGES_PER_RANGE	128
 #define BrinGetPagesPerRange(relation) \
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index 97ddc925b27..eac796e6f47 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -75,6 +75,7 @@ typedef struct BrinDesc
 #define BRIN_PROCNUM_OPTIONS 		5	/* optional */
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
+#define BRIN_PROCNUM_STATISTICS		11	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 5b950129de0..9bbd1f14f12 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -804,6 +804,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -835,6 +837,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -864,6 +868,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -893,6 +899,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -905,6 +913,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -917,6 +927,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1034,6 +1046,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1062,6 +1076,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1110,6 +1126,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1160,6 +1178,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1173,6 +1194,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1261,6 +1285,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1314,6 +1341,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1366,6 +1396,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1436,6 +1468,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1467,6 +1502,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1517,6 +1554,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1530,6 +1570,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1542,6 +1585,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1668,6 +1713,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1721,6 +1769,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1771,6 +1822,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1785,6 +1838,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1799,6 +1855,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1851,6 +1910,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1924,6 +1985,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66b73c3900d..c44784a0d07 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8496,6 +8496,10 @@
 { oid => '3386', descr => 'BRIN minmax support',
   proname => 'brin_minmax_union', prorettype => 'bool',
   proargtypes => 'internal internal internal', prosrc => 'brin_minmax_union' },
+{ oid => '9800', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_stats', prorettype => 'bool',
+  proargtypes => 'internal internal int2 int2 internal int4',
+  prosrc => 'brin_minmax_stats' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 8770c5b4c60..d3d0bce257a 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -121,6 +121,11 @@ CATALOG(pg_statistic,2619,StatisticRelationId)
 	anyarray	stavalues3;
 	anyarray	stavalues4;
 	anyarray	stavalues5;
+
+	/*
+	 * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).
+	 */
+	bytea		staindexam;
 #endif
 } FormData_pg_statistic;
 
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 689dbb77024..dba411cacf7 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -155,6 +155,7 @@ typedef struct VacAttrStats
 	float4	   *stanumbers[STATISTIC_NUM_SLOTS];
 	int			numvalues[STATISTIC_NUM_SLOTS];
 	Datum	   *stavalues[STATISTIC_NUM_SLOTS];
+	Datum		staindexam;		/* index-specific stats (as bytea) */
 
 	/*
 	 * These fields describe the stavalues[n] element types. They will be
@@ -299,6 +300,7 @@ extern PGDLLIMPORT int vacuum_multixact_freeze_min_age;
 extern PGDLLIMPORT int vacuum_multixact_freeze_table_age;
 extern PGDLLIMPORT int vacuum_failsafe_age;
 extern PGDLLIMPORT int vacuum_multixact_failsafe_age;
+extern PGDLLIMPORT bool enable_indexam_stats;
 
 /* Variables for cost-based parallel vacuum */
 extern PGDLLIMPORT pg_atomic_uint32 *VacuumSharedCostBalance;
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b9728..fcef91d306d 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -185,6 +185,7 @@ extern Oid	getBaseType(Oid typid);
 extern Oid	getBaseTypeAndTypmod(Oid typid, int32 *typmod);
 extern int32 get_typavgwidth(Oid typid, int32 typmod);
 extern int32 get_attavgwidth(Oid relid, AttrNumber attnum);
+extern bytea *get_attindexam(Oid relid, AttrNumber attnum);
 extern bool get_attstatsslot(AttStatsSlot *sslot, HeapTuple statstuple,
 							 int reqkind, Oid reqop, int flags);
 extern void free_attstatsslot(AttStatsSlot *sslot);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 001c6e7eb9d..b7fda6fc828 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -117,6 +117,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashagg                 | on
  enable_hashjoin                | on
  enable_incremental_sort        | on
+ enable_indexam_stats           | off
  enable_indexonlyscan           | on
  enable_indexscan               | on
  enable_material                | on
@@ -132,7 +133,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(21 rows)
+(22 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.39.1

0002-wip-introduce-debug_brin_stats-20230216.patchtext/x-patch; charset=UTF-8; name=0002-wip-introduce-debug_brin_stats-20230216.patchDownload

From 7e59c0889ffd7553e8e6c8572603ebb32b1fb649 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 17:46:51 +0200
Subject: [PATCH 02/10] wip: introduce debug_brin_stats

---
 src/backend/access/brin/brin_minmax.c | 80 +++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c   | 18 ++++++
 2 files changed, 98 insertions(+)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 67e68a8b8ed..6e20a34fbcd 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -27,6 +27,10 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 
+#ifdef DEBUG_BRIN_STATS
+bool debug_brin_stats = false;
+#endif
+
 typedef struct MinmaxOpaque
 {
 	Oid			cached_subtype;
@@ -479,6 +483,13 @@ brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
 {
 	int64	noverlaps;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	noverlaps = 0;
 	for (int i = 0; i < nranges; i++)
 	{
@@ -501,6 +512,16 @@ brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
 	 */
 	noverlaps *= 2;
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->avg_overlaps = (double) noverlaps / nranges;
 }
 
@@ -527,6 +548,13 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 
 	int64  *unique = (int64 *) palloc0(sizeof(int64) * nvalues);
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	/*
 	 * Build running count of unique values. We know there are unique[i]
 	 * unique values in values array up to index "i".
@@ -572,6 +600,18 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 	Assert(nmatches >= 0);
 	Assert(nmatches_unique >= 0);
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f", nmatches_unique, nvalues_unique,
+			(double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->avg_matches = (double) nmatches / numrows;
 	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
 }
@@ -603,6 +643,13 @@ brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
 			minval_corr = 0,
 			maxval_corr = 0;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	for (int i = 1; i < nranges; i++)
 	{
 		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
@@ -625,6 +672,19 @@ brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
 
 	stats->minval_correlation = (double) minval_corr / nranges;
 	stats->maxval_correlation = (double) maxval_corr / nranges;
+
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_value_stats -----");
+		elog(WARNING, "minval ndistinct " INT64_FORMAT " correlation %f",
+			 stats->minval_ndistinct, stats->minval_correlation);
+		elog(WARNING, "maxval ndistinct " INT64_FORMAT " correlation %f",
+			 stats->maxval_ndistinct, stats->maxval_correlation);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
 }
 
 /*
@@ -649,6 +709,13 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 			max_minval = 0,
 			max_maxval = 0;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	for (int i = 1; i < nranges; i++)
 	{
 		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
@@ -700,6 +767,19 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 		}
 	}
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_increment_stats -----");
+		elog(WARNING, "minval ndistinct %ld sum %f max %f avg %f",
+			 minval_ndist, sum_minval, max_minval, sum_minval / minval_ndist);
+		elog(WARNING, "maxval ndistinct %ld sum %f max %f avg %f",
+			 maxval_ndist, sum_maxval, max_maxval, sum_maxval / maxval_ndist);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->minval_increment_avg = (sum_minval / minval_ndist);
 	stats->minval_increment_max = max_minval;
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 67687d158e6..f8d06296fb1 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -96,6 +96,10 @@ extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
 extern bool synchronize_seqscans;
 
+#ifdef DEBUG_BRIN_STATS
+extern bool debug_brin_stats;
+#endif
+
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
 #endif
@@ -1230,6 +1234,20 @@ struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+#ifdef DEBUG_BRIN_STATS
+	/* this is undocumented because not exposed in a standard build */
+	{
+		{"debug_brin_stats", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Print info about calculated BRIN statistics."),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&debug_brin_stats,
+		false,
+		NULL, NULL, NULL
+	},
+#endif
+
 	{
 		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Terminate session on any error."),
-- 
2.39.1

0003-wip-introduce-debug_brin_cross_check-20230216.patchtext/x-patch; charset=UTF-8; name=0003-wip-introduce-debug_brin_cross_check-20230216.patchDownload

From ea72989d0b1ce853463c08c99cf264c199a24392 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 20:47:31 +0200
Subject: [PATCH 03/10] wip: introduce debug_brin_cross_check

---
 src/backend/access/brin/brin_minmax.c | 574 ++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c   |  10 +
 2 files changed, 584 insertions(+)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 6e20a34fbcd..d08b8e0e63e 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -29,6 +29,7 @@
 
 #ifdef DEBUG_BRIN_STATS
 bool debug_brin_stats = false;
+bool debug_brin_cross_check = false;
 #endif
 
 typedef struct MinmaxOpaque
@@ -339,6 +340,50 @@ range_values_cmp(const void *a, const void *b, void *arg)
 	return DatumGetInt32(c);
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * maxval_start
+ *		Determine first index so that (maxvalue >= value).
+ *
+ * The array of ranges is expected to be sorted by maxvalue, so this is the first
+ * range that can possibly intersect with range having "value" as minval.
+ */
+static int
+maxval_start(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[start]->max_value, typcache) <= 0)
+		return 0;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[end]->max_value, typcache) > 0)
+		return nranges;
+
+	while ((end - start) > 0)
+	{
+		int	 midpoint;
+		int	 r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->max_value, typcache);
+
+		if (r <= 0)
+			end = midpoint;
+		else
+			start = (midpoint + 1);
+	}
+
+	Assert(ranges[start]->max_value >= value);
+	Assert(ranges[start-1]->max_value < value);
+
+	return start;
+}
+#endif
+
 /*
  * minval_end
  *		Determine first index so that (minval > value).
@@ -616,6 +661,316 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * Simple histogram, with bins tracking value and two overlap counts.
+ *
+ * XXX Maybe we should have two separate histograms, one for all counts and
+ * another one for "unique" values.
+ *
+ * XXX Serialize the histogram. There might be a data set where we have very
+ * many distinct buckets (values having very different number of matching
+ * ranges) - not sure if there's some sort of upper limit (but hard to say for
+ * other opclasses, like bloom). And we don't want arbitrarily large histogram,
+ * to keep the statistics fairly small, I guess. So we'd need to pick a subset,
+ * merge buckets with "similar" counts, or approximate it somehow. For now we
+ * don't serialize it, because we don't use the histogram.
+ */
+typedef struct histogram_bin_t
+{
+	int64	value;
+	int64	count;
+} histogram_bin_t;
+
+typedef struct histogram_t
+{
+	int				nbins;
+	int				nbins_max;
+	histogram_bin_t	bins[FLEXIBLE_ARRAY_MEMBER];
+} histogram_t;
+
+#define HISTOGRAM_BINS_START 32
+
+/* allocate histogram with default number of bins */
+static histogram_t *
+histogram_init(void)
+{
+	histogram_t *hist;
+
+	hist = (histogram_t *) palloc0(offsetof(histogram_t, bins) +
+								   sizeof(histogram_bin_t) * HISTOGRAM_BINS_START);
+	hist->nbins_max = HISTOGRAM_BINS_START;
+
+	return hist;
+}
+
+/*
+ * histogram_add
+ *			 Add a hit for a particular value to the histogram.
+ *
+ * XXX We don't sort the bins, so just do binary sort. For large number of values
+ * this might be an issue, for small number of values a linear search is fine.
+ */
+static histogram_t *
+histogram_add(histogram_t *hist, int value)
+{
+	bool	found = false;
+	histogram_bin_t *bin;
+
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		if (hist->bins[i].value == value)
+		{
+			bin = &hist->bins[i];
+			found = true;
+		}
+	}
+
+	if (!found)
+	{
+		if (hist->nbins == hist->nbins_max)
+		{
+			int		nbins = (2 * hist->nbins_max);
+
+			hist = repalloc(hist, offsetof(histogram_t, bins) +
+							sizeof(histogram_bin_t) * nbins);
+			hist->nbins_max = nbins;
+		}
+
+		Assert(hist->nbins < hist->nbins_max);
+
+		bin = &hist->bins[hist->nbins++];
+		bin->value = value;
+		bin->count = 0;
+	}
+
+	bin->count += 1;
+
+	Assert(bin->value == value);
+	Assert(bin->count >= 0);
+
+	return hist;
+}
+
+/* used to sort histogram bins by value */
+static int
+histogram_bin_cmp(const void *a, const void *b)
+{
+	histogram_bin_t *ba = (histogram_bin_t *) a;
+	histogram_bin_t *bb = (histogram_bin_t *) b;
+
+	if (ba->value < bb->value)
+		return -1;
+
+	if (bb->value < ba->value)
+		return 1;
+
+	return 0;
+}
+
+static void
+histogram_print(histogram_t *hist)
+{
+	return;
+
+	elog(WARNING, "----- histogram -----");
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		elog(WARNING, "bin %d value %ld count %ld",
+				i, hist->bins[i].value, hist->bins[i].count);
+	}
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges2
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Match sample tuples to the ranges, so that we can count how many ranges
+ * a value matches on average. This might seem redundant to the number of
+ * overlaps, because the value is ~avg_overlaps/2.
+ *
+ * Imagine ranges arranged in "shifted" uniformly by 1/overlaps, e.g. with 3
+ * overlaps [0,100], [33,133], [66, 166] and so on. A random value will hit
+ * only half of there ranges, thus 1/2. This can be extended to randomly
+ * overlapping ranges.
+ *
+ * However, we may not be able to count overlaps for some opclasses (e.g. for
+ * bloom ranges), in which case we have at least this.
+ *
+ * This simply walks the values, and determines matching ranges by looking
+ * for lower/upper bound in ranges ordered by minval/maxval.
+ *
+ * XXX The other question is what to do about duplicate values. If we have a
+ * very frequent value in the sample, it's likely in many places/ranges. Which
+ * will skew the average, because it'll be added repeatedly. So we also count
+ * avg_ranges for unique values.
+ *
+ * XXX The relationship that (average_matches ~ average_overlaps/2) only
+ * works for minmax opclass, and can't be extended to minmax-multi. The
+ * overlaps can only consider the two extreme values (essentially treating
+ * the summary as a single minmax range), because that's what brinsort
+ * needs. But the minmax-multi range may have "gaps" (kinda the whole point
+ * of these opclasses), which affects matching tuples to ranges.
+ *
+ * XXX This also builds histograms of the number of matches, both for the
+ * raw and unique values. At the moment we don't do anything with the
+ * results, though (except for printing those).
+ */
+static void
+brin_minmax_match_tuples_to_ranges2(BrinRanges *ranges,
+									BrinRange **minranges, BrinRange **maxranges,
+									int numrows, HeapTuple *rows,
+									int nvalues, Datum *values,
+									TypeCacheEntry *typcache,
+									BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nmatches_value = 0;
+	int64	nvalues_unique = 0;
+
+	histogram_t	   *hist = histogram_init();
+	histogram_t	   *hist_unique = histogram_init();
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		int		start;
+		int		end;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Same value as preceding, so just use the preceding count.
+		 * We don't increment the unique counters, because this is
+		 * a duplicate.
+		 */
+		if ((i > 0) && (range_values_cmp(&values[i-1], &values[i], typcache) == 0))
+		{
+			nmatches += nmatches_value;
+			hist = histogram_add(hist, nmatches_value);
+			continue;
+		}
+
+		nmatches_value = 0;
+
+		start = maxval_start(maxranges, ranges->nranges, values[i], typcache);
+		end = minval_end(minranges, ranges->nranges, values[i], typcache);
+
+		for (int j = start; j < ranges->nranges; j++)
+		{
+			if (maxranges[j]->min_index >= end)
+				continue;
+
+			if (maxranges[j]->min_index_lowest >= end)
+				break;
+
+			nmatches_value++;
+		}
+
+		hist = histogram_add(hist, nmatches_value);
+		hist_unique = histogram_add(hist_unique, nmatches_value);
+
+		nmatches += nmatches_value;
+		nmatches_unique += nmatches_value;
+		nvalues_unique++;
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges2 -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f",
+			 nmatches_unique, nvalues_unique, (double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_matches != (double) nmatches / numrows)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges2: avg_matches mismatch %f != %f",
+			 stats->avg_matches, (double) nmatches / numrows);
+
+	if (stats->avg_matches_unique != (double) nmatches_unique / nvalues_unique)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges2: avg_matches_unique mismatch %f != %f",
+			 stats->avg_matches_unique, (double) nmatches_unique / nvalues_unique);
+
+	pg_qsort(hist->bins, hist->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+	pg_qsort(hist_unique->bins, hist_unique->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+
+	histogram_print(hist);
+	histogram_print(hist_unique);
+
+	pfree(hist);
+	pfree(hist_unique);
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges_bruteforce
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Bruteforce approach, used mostly for cross-checking.
+ */
+static void
+brin_minmax_match_tuples_to_ranges_bruteforce(BrinRanges *ranges,
+											  int numrows, HeapTuple *rows,
+											  int nvalues, Datum *values,
+											  TypeCacheEntry *typcache,
+											  BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nvalues_unique = 0;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		bool	is_unique;
+		int64	nmatches_value = 0;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* is this a new value? */
+		is_unique = ((i == 0) || (range_values_cmp(&values[i-1], &values[i], typcache) != 0));
+
+		/* count unique values */
+		nvalues_unique += (is_unique) ? 1 : 0;
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			if (range_values_cmp(&values[i], &ranges->ranges[j].min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&values[i], &ranges->ranges[j].max_value, typcache) > 0)
+				continue;
+
+			nmatches_value++;
+		}
+
+		nmatches += nmatches_value;
+		nmatches_unique += (is_unique) ? nmatches_value : 0;
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges_bruteforce -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f", nmatches_unique, nvalues_unique,
+			 (double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_matches != (double) nmatches / numrows)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges_bruteforce: avg_matches mismatch %f != %f",
+			 stats->avg_matches, (double) nmatches / numrows);
+
+	if (stats->avg_matches_unique != (double) nmatches_unique / nvalues_unique)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges_bruteforce: avg_matches_unique mismatch %f != %f",
+			 stats->avg_matches_unique, (double) nmatches_unique / nvalues_unique);
+}
+#endif
+
 /*
  * brin_minmax_value_stats
  *		Calculate statistics about minval/maxval values.
@@ -787,6 +1142,198 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 	stats->maxval_increment_max = max_maxval;
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * brin_minmax_count_overlaps2
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges/maxranges to quickly eliminate ranges that can't
+ * possibly intersect.
+ *
+ * XXX Seems rather complicated and works poorly for wide ranges (with outlier
+ * values), brin_minmax_count_overlaps is likely better.
+ */
+static void
+brin_minmax_count_overlaps2(BrinRanges *ranges,
+						   BrinRange **minranges, BrinRange **maxranges,
+						   TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	int64			noverlaps;
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Walk the ranges ordered by max_values, see how many ranges overlap.
+	 *
+	 * Once we get to a state where (min_value > current.max_value) for
+	 * all future ranges, we know none of them can overlap and we can
+	 * terminate. This is what min_index_lowest is for.
+	 *
+	 * XXX If there are very wide ranges (with outlier min/max values),
+	 * the min_index_lowest is going to be pretty useless, because the
+	 * range will be sorted at the very end by max_value, but will have
+	 * very low min_index, so this won't work.
+	 *
+	 * XXX We could collect a more elaborate stuff, like for example a
+	 * histogram of number of overlaps, or maximum number of overlaps.
+	 * So we'd have average, but then also an info if there are some
+	 * ranges with very many overlaps.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int			idx = (i + 1);
+		BrinRange *ra = maxranges[i];
+		uint64		min_index = ra->min_index;
+
+		CHECK_FOR_INTERRUPTS();
+
+#ifdef NOT_USED
+		/*
+		 * XXX Not needed, we can just count "future" ranges and then
+		 * we just multiply by 2.
+		 */
+
+		/*
+		 * What's the first range that might overlap with this one?
+		 * needs to have maxval > current.minval.
+		 */
+		while (idx > 0)
+		{
+			BrinRange *rb = maxranges[idx - 1];
+
+			/* the range is before the current one, so can't intersect */
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				break;
+
+			idx--;
+		}
+#endif
+
+		/*
+		 * Find the first min_index that is higher than the max_value,
+		 * so that we can compare that instead of the values in the
+		 * next loop. There should be fewer value comparisons than in
+		 * the next loop, so we'll save on function calls.
+		 */
+		while (min_index < ranges->nranges)
+		{
+			if (range_values_cmp(&minranges[min_index]->min_value,
+								 &ra->max_value, typcache) > 0)
+				break;
+
+			min_index++;
+		}
+
+		/*
+		 * Walk the following ranges (ordered by max_value), and check
+		 * if it overlaps. If it matches, we look at the next one. If
+		 * not, we check if there can be more ranges.
+		 */
+		for (int j = idx; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = maxranges[j];
+
+			/* the range overlaps - just continue with the next one */
+			// if (range_values_cmp(&rb->min_value, &ra->max_value, typcache) <= 0)
+			if (rb->min_index < min_index)
+			{
+				noverlaps++;
+				continue;
+			}
+
+			/*
+			 * Are there any future ranges that might overlap? We can
+			 * check the min_index_lowest to decide quickly.
+			 */
+			 if (rb->min_index_lowest >= min_index)
+					break;
+		}
+	}
+
+	/*
+	 * We only count intersect for "following" ranges when ordered by maxval,
+	 * so we only see 1/2 the overlaps. So double the result.
+	 */
+	noverlaps *= 2;
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps2 -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_overlaps != (double) noverlaps / ranges->nranges)
+		elog(ERROR, "brin_minmax_count_overlaps2: mismatch %f != %f",
+			 stats->avg_overlaps, (double) noverlaps / ranges->nranges);
+}
+
+/*
+ * brin_minmax_count_overlaps_bruteforce
+ *		Calculate number of overlaps by brute force.
+ *
+ * Actually compares every range to every other range. Quite expensive, used
+ * primarily to cross-check the other algorithms.
+ */
+static void
+brin_minmax_count_overlaps_bruteforce(BrinRanges *ranges,
+									  TypeCacheEntry *typcache,
+									  BrinMinmaxStats *stats)
+{
+	int64			noverlaps;
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Brute force calculation of overlapping ranges, comparing each
+	 * range to every other range - bound to be pretty expensive, as
+	 * it's pretty much O(N^2). Kept mostly for easy cross-check with
+	 * the preceding "optimized" code.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		BrinRange *ra = &ranges->ranges[i];
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = &ranges->ranges[j];
+
+			CHECK_FOR_INTERRUPTS();
+
+			if (i == j)
+				continue;
+
+			if (range_values_cmp(&ra->max_value, &rb->min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				continue;
+
+#if 0
+			elog(DEBUG1, "[%ld,%ld] overlaps [%ld,%ld]",
+				 ra->min_value, ra->max_value,
+				 rb->min_value, rb->max_value);
+#endif
+
+			noverlaps++;
+		}
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps_bruteforce -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_overlaps != (double) noverlaps / ranges->nranges)
+		elog(ERROR, "brin_minmax_count_overlaps2: mismatch %f != %f",
+			 stats->avg_overlaps, (double) noverlaps / ranges->nranges);
+}
+#endif
+
 /*
  * brin_minmax_stats
  *		Calculate custom statistics for a BRIN minmax index.
@@ -798,6 +1345,11 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
  *  - average number of rows matching a range
  *  - number of distinct minval/maxval values
  *
+ * There are multiple ways to calculate some of the metrics, so to allow
+ * cross-checking during development it's possible to run and compare all.
+ * To do that, define STATS_CROSS_CHECK. There's also STATS_DEBUG define
+ * that simply prints the calculated results.
+ *
  * XXX This could also calculate correlation of the range minval, so that
  * we can estimate how much random I/O will happen during the BrinSort.
  * And perhaps we should also sort the ranges by (minval,block_start) to
@@ -1106,6 +1658,14 @@ brin_minmax_stats(PG_FUNCTION_ARGS)
 	/* calculate average number of overlapping ranges for any range */
 	brin_minmax_count_overlaps(minranges, ranges->nranges, typcache, stats);
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_cross_check)
+	{
+		brin_minmax_count_overlaps2(ranges, minranges, maxranges, typcache, stats);
+		brin_minmax_count_overlaps_bruteforce(ranges, typcache, stats);
+	}
+#endif
+
 	/* calculate minval/maxval stats (distinct values and correlation) */
 	brin_minmax_value_stats(minranges, maxranges,
 							ranges->nranges, typcache, stats);
@@ -1134,6 +1694,20 @@ brin_minmax_stats(PG_FUNCTION_ARGS)
 										   numrows, rows, nvalues, values,
 										   typcache, stats);
 
+#ifdef DEBUG_BRIN_STATS
+		if (debug_brin_cross_check)
+		{
+			brin_minmax_match_tuples_to_ranges2(ranges, minranges, maxranges,
+												numrows, rows, nvalues, values,
+												typcache, stats);
+
+			brin_minmax_match_tuples_to_ranges_bruteforce(ranges,
+														  numrows, rows,
+														  nvalues, values,
+														  typcache, stats);
+		}
+#endif
+
 		brin_minmax_increment_stats(minranges, maxranges, ranges->nranges,
 									values, nvalues, typcache, stats);
 	}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f8d06296fb1..1d576343ecd 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -98,6 +98,7 @@ extern bool synchronize_seqscans;
 
 #ifdef DEBUG_BRIN_STATS
 extern bool debug_brin_stats;
+extern bool debug_brin_cross_check;
 #endif
 
 #ifdef TRACE_SYNCSCAN
@@ -1246,6 +1247,15 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"debug_brin_cross_check", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Cross-check calculation of BRIN statistics."),
+			NULL
+		},
+		&debug_brin_cross_check,
+		false,
+		NULL, NULL, NULL
+	},
 #endif
 
 	{
-- 
2.39.1

0004-Allow-BRIN-indexes-to-produce-sorted-output-20230216.patchtext/x-patch; charset=UTF-8; name=0004-Allow-BRIN-indexes-to-produce-sorted-output-20230216.patchDownload

From 5805b3f3a3ca986a5aa87e3b80247e395dcded68 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 9 Oct 2022 11:33:37 +0200
Subject: [PATCH 04/10] Allow BRIN indexes to produce sorted output

Some BRIN indexes can be used to produce sorted output, by using the
range information to sort tuples incrementally. This is particularly
interesting for LIMIT queries, which only need to scan the first few
rows, and alternative plans (e.g. Seq Scan + Sort) have a very high
startup cost.

Of course, if there are e.g. BTREE indexes this is going to be slower,
but people are unlikely to have both index types on the same column.

This is disabled by default, use enable_brinsort GUC to enable it.
---
 src/backend/access/brin/brin_minmax.c         |  402 ++++
 src/backend/commands/explain.c                |   44 +
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/meson.build              |    1 +
 src/backend/executor/nodeBrinSort.c           | 1661 +++++++++++++++++
 src/backend/optimizer/path/costsize.c         |  254 +++
 src/backend/optimizer/path/indxpath.c         |  183 ++
 src/backend/optimizer/path/pathkeys.c         |   49 +
 src/backend/optimizer/plan/createplan.c       |  189 ++
 src/backend/optimizer/plan/setrefs.c          |   19 +
 src/backend/optimizer/util/pathnode.c         |   57 +
 src/backend/utils/misc/guc_tables.c           |   28 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/backend/utils/sort/tuplesort.c            |   12 +
 src/include/access/brin.h                     |   35 -
 src/include/access/brin_internal.h            |    1 +
 src/include/catalog/pg_amproc.dat             |   64 +
 src/include/catalog/pg_proc.dat               |    5 +-
 src/include/executor/nodeBrinSort.h           |   47 +
 src/include/nodes/execnodes.h                 |  108 ++
 src/include/nodes/pathnodes.h                 |   11 +
 src/include/nodes/plannodes.h                 |   26 +
 src/include/optimizer/cost.h                  |    3 +
 src/include/optimizer/pathnode.h              |    9 +
 src/include/optimizer/paths.h                 |    3 +
 src/include/utils/tuplesort.h                 |    1 +
 src/test/regress/expected/sysviews.out        |    3 +-
 28 files changed, 3190 insertions(+), 37 deletions(-)
 create mode 100644 src/backend/executor/nodeBrinSort.c
 create mode 100644 src/include/executor/nodeBrinSort.h

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index d08b8e0e63e..987379c911f 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -16,6 +16,10 @@
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
 #include "miscadmin.h"
@@ -42,6 +46,9 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
 
+/* print info about ranges */
+#define BRINSORT_DEBUG
+
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1730,6 +1737,401 @@ cleanup:
 	PG_RETURN_POINTER(stats);
 }
 
+/*
+ * brin_minmax_range_tupdesc
+ *		Create a tuple descriptor to store BrinRange data.
+ */
+static TupleDesc
+brin_minmax_range_tupdesc(BrinDesc *brdesc, AttrNumber attnum)
+{
+	TupleDesc	tupdesc;
+	AttrNumber	attno = 1;
+
+	/* expect minimum and maximum */
+	Assert(brdesc->bd_info[attnum - 1]->oi_nstored == 2);
+
+	tupdesc = CreateTemplateTupleDesc(7);
+
+	/* blkno_start */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT8OID, -1, 0);
+
+	/* blkno_end (could be calculated as blkno_start + pages_per_range) */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT8OID, -1, 0);
+
+	/* has_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* all_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* not_summarized */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* min_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	/* max_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	return tupdesc;
+}
+
+/*
+ * brin_minmax_scan_init
+ *		Prepare the BrinRangeScanDesc including the sorting info etc.
+ *
+ * We want to have the ranges in roughly this order
+ *
+ * - not-summarized
+ * - summarized, non-null values
+ * - summarized, all-nulls
+ *
+ * We do it this way, because the not-summarized ranges need to be
+ * scanned always (both to produce NULL and non-NULL values), and
+ * we need to read all of them into the tuplesort before producing
+ * anything. So placing them at the beginning is reasonable.
+ *
+ * The all-nulls ranges are placed last, because when processing
+ * NULLs we need to scan everything anyway (some of the ranges might
+ * have has_nulls=true). But for non-NULL values we can abort once
+ * we hit the first all-nulls range.
+ *
+ * The regular ranges are sorted by blkno_start, to make it maybe
+ * a bit more sequential (but this only helps if there are ranges
+ * with the same minval).
+ */
+static BrinRangeScanDesc *
+brin_minmax_scan_init(BrinDesc *bdesc, Oid collation, AttrNumber attnum, bool asc)
+{
+	BrinRangeScanDesc  *scan;
+
+	/* sort by (not_summarized, minval, blkno_start, all_nulls) */
+	AttrNumber			keys[4];
+	Oid					collations[4];
+	bool				nullsFirst[4];
+	Oid					operators[4];
+	Oid					typid;
+	TypeCacheEntry	   *typcache;
+
+	/* we expect to have min/max value for each range, same type for both */
+	Assert(bdesc->bd_info[attnum - 1]->oi_nstored == 2);
+	Assert(bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id ==
+		   bdesc->bd_info[attnum - 1]->oi_typcache[1]->type_id);
+
+	scan = (BrinRangeScanDesc *) palloc0(sizeof(BrinRangeScanDesc));
+
+	/* build tuple descriptor for range data */
+	scan->tdesc = brin_minmax_range_tupdesc(bdesc, attnum);
+
+	/* initialize ordering info */
+	keys[0] = 5;				/* not_summarized */
+	keys[1] = 4;				/* all_nulls */
+	keys[2] = (asc) ? 6 : 7;	/* min_value (asc) or max_value (desc) */
+	keys[3] = 1;				/* blkno_start */
+
+	collations[0] = InvalidOid;	/* FIXME */
+	collations[1] = InvalidOid;	/* FIXME */
+	collations[2] = collation;	/* FIXME */
+	collations[3] = InvalidOid;	/* FIXME */
+
+	/* unrelated to the ordering desired by the user */
+	nullsFirst[0] = false;
+	nullsFirst[1] = false;
+	nullsFirst[2] = false;
+	nullsFirst[3] = false;
+
+	/* lookup sort operator for the boolean type (used for not_summarized) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_GT_OPR);
+	operators[0] = typcache->gt_opr;
+
+	/* lookup sort operator for the boolean type (used for all_nulls) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_LT_OPR);
+	operators[1] = typcache->lt_opr;
+
+	/* lookup sort operator for the min/max type */
+	typid = bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id;
+	typcache = lookup_type_cache(typid, TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+	operators[2] = (asc) ? typcache->lt_opr : typcache->gt_opr;
+
+	/* lookup sort operator for the bigint type (used for blkno_start) */
+	typcache = lookup_type_cache(INT8OID, TYPECACHE_LT_OPR);
+	operators[3] = typcache->lt_opr;
+
+	/*
+	 * XXX better to keep this small enough to fit into L2/L3, large values
+	 * of work_mem may easily make this slower.
+	 */
+	scan->ranges = tuplesort_begin_heap(scan->tdesc,
+										4, /* nkeys */
+										keys,
+										operators,
+										collations,
+										nullsFirst,
+										work_mem,
+										NULL,
+										TUPLESORT_RANDOMACCESS);
+
+	scan->slot = MakeSingleTupleTableSlot(scan->tdesc,
+										  &TTSOpsMinimalTuple);
+
+	return scan;
+}
+
+/*
+ * brin_minmax_scan_add_tuple
+ *		Form and store a tuple representing the BRIN range to the tuplestore.
+ */
+static void
+brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan, TupleTableSlot *slot,
+						   BlockNumber block_start, BlockNumber block_end,
+						   bool has_nulls, bool all_nulls, bool not_summarized,
+						   Datum min_value, Datum max_value)
+{
+	ExecClearTuple(slot);
+
+	memset(slot->tts_isnull, false, 7 * sizeof(bool));
+
+	slot->tts_values[0] = Int64GetDatum(block_start);
+	slot->tts_values[1] = Int64GetDatum(block_end);
+	slot->tts_values[2] = BoolGetDatum(has_nulls);
+	slot->tts_values[3] = BoolGetDatum(all_nulls);
+	slot->tts_values[4] = BoolGetDatum(not_summarized);
+	slot->tts_values[5] = min_value;
+	slot->tts_values[6] = max_value;
+
+	if (all_nulls || not_summarized)
+	{
+		slot->tts_isnull[5] = true;
+		slot->tts_isnull[6] = true;
+	}
+
+	ExecStoreVirtualTuple(slot);
+
+	tuplesort_puttupleslot(scan->ranges, slot);
+
+	scan->nranges++;
+}
+
+#ifdef BRINSORT_DEBUG
+/*
+ * brin_minmax_scan_next
+ *		Return the next BRIN range information from the tuplestore.
+ *
+ * Returns NULL when there are no more ranges.
+ */
+static BrinRange *
+brin_minmax_scan_next(BrinRangeScanDesc *scan)
+{
+	if (tuplesort_gettupleslot(scan->ranges, true, false, scan->slot, NULL))
+	{
+		bool		isnull;
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = slot_getattr(scan->slot, 1, &isnull);
+		range->blkno_end = slot_getattr(scan->slot, 2, &isnull);
+		range->has_nulls = slot_getattr(scan->slot, 3, &isnull);
+		range->all_nulls = slot_getattr(scan->slot, 4, &isnull);
+		range->not_summarized = slot_getattr(scan->slot, 5, &isnull);
+		range->min_value = slot_getattr(scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(scan->slot, 7, &isnull);
+
+		return range;
+	}
+
+	return NULL;
+}
+
+/*
+ * brin_minmax_scan_dump
+ *		Print info about all page ranges stored in the tuplestore.
+ */
+static void
+brin_minmax_scan_dump(BrinRangeScanDesc *scan)
+{
+	BrinRange *range;
+
+	if (!message_level_is_interesting(WARNING))
+		return;
+
+	elog(WARNING, "===== dumping =====");
+	while ((range = brin_minmax_scan_next(scan)) != NULL)
+	{
+		elog(WARNING, "[%u %u] has_nulls %d all_nulls %d not_summarized %d values [%ld %ld]",
+			 range->blkno_start, range->blkno_end,
+			 range->has_nulls, range->all_nulls, range->not_summarized,
+			 range->min_value, range->max_value);
+
+		pfree(range);
+	}
+
+	/* reset the tuplestore, so that we can start scanning again */
+	tuplesort_rescan(scan->ranges);
+}
+#endif
+
+static void
+brin_minmax_scan_finalize(BrinRangeScanDesc *scan)
+{
+	tuplesort_performsort(scan->ranges);
+}
+
+/*
+ * brin_minmax_ranges
+ *		Load the BRIN ranges and sort them.
+ */
+Datum
+brin_minmax_ranges(PG_FUNCTION_ARGS)
+{
+	IndexScanDesc	scan = (IndexScanDesc) PG_GETARG_POINTER(0);
+	AttrNumber		attnum = PG_GETARG_INT16(1);
+	bool			asc = PG_GETARG_BOOL(2);
+	Oid				colloid = PG_GET_COLLATION();
+	BrinOpaque *opaque;
+	Relation	indexRel;
+	Relation	heapRel;
+	BlockNumber nblocks;
+	BlockNumber	heapBlk;
+	Oid			heapOid;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinRangeScanDesc *brscan;
+	TupleTableSlot *slot;
+
+	/*
+	 * Determine how many BRIN ranges could there be, allocate space and read
+	 * all the min/max values.
+	 */
+	opaque = (BrinOpaque *) scan->opaque;
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	indexRel = bdesc->bd_index;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	heapOid = IndexGetRelation(RelationGetRelid(indexRel), false);
+	heapRel = table_open(heapOid, AccessShareLock);
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+	table_close(heapRel, AccessShareLock);
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* initialize the scan describing scan of ranges sorted by minval */
+	brscan = brin_minmax_scan_init(bdesc, colloid, attnum, asc);
+
+	slot = MakeSingleTupleTableSlot(brscan->tdesc, &TTSOpsVirtual);
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX The sort may be quite expensive, e.g. for small BRIN ranges. Maybe
+	 * we could optimize this somehow? For example, we know the not-summarized
+	 * ranges are always going to be first, and all-null ranges last, so maybe
+	 * we could stash those somewhere, and not sort them? But there are likely
+	 * only very few such ranges, in most cases. Moreover, how would we then
+	 * prepend/append those ranges to the sorted ones? Probably would have to
+	 * store them in a tuplestore, or something.
+	 *
+	 * XXX Seems that having large work_mem can be quite detrimental, because
+	 * then it overflows L2/L3 caches, making the sort much slower.
+	 *
+	 * XXX If there are other indexes, would be great to filter the ranges, so
+	 * that we only sort the interesting ones - reduces the number of ranges,
+	 * makes the sort faster.
+	 *
+	 * XXX Another option is making this incremental - e.g. only ask for the
+	 * first 1000 ranges, using a top-N sort. And then if it's not enough we
+	 * could request another chunk. But the second request would have to be
+	 * rather unlikely (because quite expensive), and the top-N sort does not
+	 * seem all that faster (as long as we don't overflow L2/L3).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   scan->xs_snapshot);
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/*
+		 * Ranges with no indexed tuple may contain anything.
+		 */
+		if (!gottuple)
+		{
+			brin_minmax_scan_add_tuple(brscan, slot,
+									   heapBlk, heapBlk + (pagesPerRange - 1),
+									   false, false, true, 0, 0);
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/*
+				 * Placeholder tuples are treated as if not summarized.
+				 *
+				 * XXX Is this correct?
+				 */
+				brin_minmax_scan_add_tuple(brscan, slot,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   false, false, true, 0, 0);
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				brin_minmax_scan_add_tuple(brscan, slot,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   bval->bv_hasnulls, bval->bv_allnulls, false,
+										   bval->bv_values[0], bval->bv_values[1]);
+			}
+		}
+	}
+
+	ExecDropSingleTupleTableSlot(slot);
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* do the sort and any necessary post-processing */
+	brin_minmax_scan_finalize(brscan);
+
+#ifdef BRINSORT_DEBUG
+	brin_minmax_scan_dump(brscan);
+#endif
+
+	PG_RETURN_POINTER(brscan);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e57bda7b62d..153e41b856f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -85,6 +85,8 @@ static void show_sort_keys(SortState *sortstate, List *ancestors,
 						   ExplainState *es);
 static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
+static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
+							   ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1100,6 +1102,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexScan:
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
+		case T_BrinSort:
 		case T_TidScan:
 		case T_TidRangeScan:
 		case T_SubqueryScan:
@@ -1262,6 +1265,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_IndexOnlyScan:
 			pname = sname = "Index Only Scan";
 			break;
+		case T_BrinSort:
+			pname = sname = "BRIN Sort";
+			break;
 		case T_BitmapIndexScan:
 			pname = sname = "Bitmap Index Scan";
 			break;
@@ -1508,6 +1514,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainScanTarget((Scan *) indexonlyscan, es);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *brinsort = (BrinSort *) plan;
+
+				ExplainIndexScanDetails(brinsort->indexid,
+										brinsort->indexorderdir,
+										es);
+				ExplainScanTarget((Scan *) brinsort, es);
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *bitmapindexscan = (BitmapIndexScan *) plan;
@@ -1790,6 +1806,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainPropertyFloat("Heap Fetches", NULL,
 									 planstate->instrument->ntuples2, 0, es);
 			break;
+		case T_BrinSort:
+			show_scan_qual(((BrinSort *) plan)->indexqualorig,
+						   "Index Cond", planstate, ancestors, es);
+			if (((BrinSort *) plan)->indexqualorig)
+				show_instrumentation_count("Rows Removed by Index Recheck", 2,
+										   planstate, es);
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			break;
 		case T_BitmapIndexScan:
 			show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
@@ -2389,6 +2417,21 @@ show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 						 ancestors, es);
 }
 
+/*
+ * Show the sort keys for a BRIN Sort node.
+ */
+static void
+show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSort	   *plan = (BrinSort *) sortstate->ss.ps.plan;
+
+	show_sort_group_keys((PlanState *) sortstate, "Sort Key",
+						 plan->numCols, 0, plan->sortColIdx,
+						 plan->sortOperators, plan->collations,
+						 plan->nullsFirst,
+						 ancestors, es);
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
@@ -3809,6 +3852,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
+		case T_BrinSort:
 			/* Assert it's on a real relation */
 			Assert(rte->rtekind == RTE_RELATION);
 			objectname = get_rel_name(rte->relid);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 11118d0ce02..bcaa2ce8e21 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -38,6 +38,7 @@ OBJS = \
 	nodeBitmapHeapscan.o \
 	nodeBitmapIndexscan.o \
 	nodeBitmapOr.o \
+	nodeBrinSort.o \
 	nodeCtescan.o \
 	nodeCustom.o \
 	nodeForeignscan.o \
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d41..93d10078091 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -79,6 +79,7 @@
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
+#include "executor/nodeBrinSort.h"
 #include "executor/nodeCtescan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -226,6 +227,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														 estate, eflags);
 			break;
 
+		case T_BrinSort:
+			result = (PlanState *) ExecInitBrinSort((BrinSort *) node,
+													estate, eflags);
+			break;
+
 		case T_BitmapIndexScan:
 			result = (PlanState *) ExecInitBitmapIndexScan((BitmapIndexScan *) node,
 														   estate, eflags);
@@ -639,6 +645,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIndexOnlyScan((IndexOnlyScanState *) node);
 			break;
 
+		case T_BrinSortState:
+			ExecEndBrinSort((BrinSortState *) node);
+			break;
+
 		case T_BitmapIndexScanState:
 			ExecEndBitmapIndexScan((BitmapIndexScanState *) node);
 			break;
diff --git a/src/backend/executor/meson.build b/src/backend/executor/meson.build
index 65f9457c9b1..ed7f38a1397 100644
--- a/src/backend/executor/meson.build
+++ b/src/backend/executor/meson.build
@@ -26,6 +26,7 @@ backend_sources += files(
   'nodeBitmapHeapscan.c',
   'nodeBitmapIndexscan.c',
   'nodeBitmapOr.c',
+  'nodeBrinSort.c',
   'nodeCtescan.c',
   'nodeCustom.c',
   'nodeForeignscan.c',
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
new file mode 100644
index 00000000000..5225e647569
--- /dev/null
+++ b/src/backend/executor/nodeBrinSort.c
@@ -0,0 +1,1661 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.c
+ *	  Routines to support sorted scan of relations using a BRIN index
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * The overall algorithm is roughly this:
+ *
+ * 0) initialize a tuplestore and a tuplesort
+ *
+ * 1) fetch list of page ranges from the BRIN index, sorted by minval
+ *    (with the not-summarized ranges first, and all-null ranges last)
+ *
+ * 2) for NULLS FIRST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ * 3) while there are ranges in the list, do this:
+ *
+ *   a) get next (distinct) minval from the list, call it watermark
+ *
+ *   b) if there are any tuples in the tuplestore, move them to tuplesort
+ *
+ *   c) process all ranges with (minval < watermark) - read tuples and feed
+ *      them either into tuplestore (when value < watermark) or tuplestore
+ *
+ *   d) sort the tuplestore, output all the tuples
+ *
+ * 4) if some tuples remain in the tuplestore, sort and output them
+ *
+ * 5) for NULLS LAST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ *
+ * For DESC orderings the process is almost the same, except that we look
+ * at maxval and use '>' operator (but that's transparent).
+ *
+ * There's a couple possible things that might be done in different ways:
+ *
+ * 1) Not using tuplestore, and feeding tuples only to a tuplesort. Then
+ * while producing the tuples, we'd only output tuples up to the current
+ * watermark, and then we'd keep the remaining tuples for the next round.
+ * Either we'd need to transfer them into a second tuplesort, or allow
+ * "reopening" the tuplesort and adding more tuples. And then only the
+ * part since the watermark would get sorted (possibly using a merge-sort
+ * with the already sorted part).
+ *
+ *
+ * 2) The other question is what to do with NULL values - at the moment we
+ * just read the ranges, output the NULL tuples and that's it - we're not
+ * retaining any non-NULL tuples, so that we'll read the ranges again in
+ * the second range. The logic here is that either there are very few
+ * such ranges, so it's won't cost much to just re-read them. Or maybe
+ * there are very many such ranges, and we'd do a lot of spilling to the
+ * tuplestore, and it's not much more expensive to just re-read the source
+ * data. There are counter-examples, though - e.g., there might be many
+ * has_nulls ranges, but with very few non-NULL tuples. In this case it
+ * might be better to actually spill the tuples instead of re-reading all
+ * the ranges. Maybe this is something we can do at run-time, or maybe we
+ * could estimate this at planning time. We do know the null_frac for the
+ * column, so we know the number of NULL rows. And we also know the number
+ * of all_nulls and has_nulls ranges. We can estimate the number of rows
+ * per range, and we can estimate how many non-NULL rows are in the
+ * has_nulls ranges (we don't need to re-read all-nulls ranges). There's
+ * also the filter, which may reduce the amount of rows to store.
+ *
+ * So we'd need to compare two metrics calculated roughly like this:
+ *
+ *   cost(re-reading has-nulls ranges)
+ *      = cost(random_page_cost * n_has_nulls + seq_page_cost * pages_per_range)
+ *
+ *   cost(spilling non-NULL rows from has-nulls ranges)
+ *      = cost(numrows * width / BLCKSZ * seq_page_cost * 2)
+ *
+ * where numrows is the number of non-NULL rows in has_null ranges, which
+ * can be calculated like this:
+ *
+ *   // estimated number of rows in has-null ranges
+ *   rows_in_has_nulls = (reltuples / relpages) * pages_per_range * n_has_nulls
+ *
+ *   // number of NULL rows in the has-nulls ranges
+ *   nulls_in_ranges = reltuples * null_frac - n_all_nulls * (reltuples / relpages)
+ *
+ *   // numrows is the difference, multiplied by selectivity of the index
+ *   // filter condition (value between 0.0 and 1.0)
+ *   numrows = (rows_in_has_nulls - nulls_in_ranges) * selectivity
+ *
+ * This ignores non-summarized ranges, but there should be only very few of
+ * those, so it should not make a huge difference. Otherwise we can divide
+ * them between regular, has-nulls and all-nulls pages to keep the ratio.
+ *
+ *
+ * 3) How large step to make when updating the watermark?
+ *
+ * When updating the watermark, one option is to simply proceed to the next
+ * distinct minval value, which is the smallest possible step we can make.
+ * This may be both fine and very inefficient, depending on how many rows
+ * end up in the tuplesort and how many rows we end up spilling (possibly
+ * repeatedly to the tuplestore).
+ *
+ * When having to sort large number of rows, it's inefficient to run many
+ * tiny sorts, even if it produces correct result. For example when sorting
+ * 1M rows, we may split this as either (a) 100000x sorts of 10 rows, or
+ * (b) 1000 sorts of 1000 rows. The (b) option is almost certainly more
+ * efficient. Maybe sorts of 10k rows would be even better, if it fits
+ * into work_mem.
+ *
+ * This gets back to how large the page ranges are, and if/how much they
+ * overlap. With tiny ranges (e.g. a single-page ranges), a single range
+ * can only add as many rows as we can fit on a single page. So we need
+ * more ranges by default - how many watermark steps that is depends on
+ * how many distinct minval values are there ...
+ *
+ * Then there's overlaps - if ranges do not overlap, we're done and we'll
+ * add the whole range because the next watermark is above maxval. But
+ * when the ranges overlap, we'll only add the first part (assuming the
+ * minval of the next range is the watermark). Assume 10 overlapping
+ * ranges - imagine for example ranges shifted by 10%, so something like
+ *
+ *   [0,100] [10,110], [20,120], [30, 130], ..., [90, 190]
+ *
+ * In the first step we use watermark=10 and load the first range, with
+ * maybe 1000 rows in total. But assuming uniform distribution, only about
+ * 100 rows will go into the tuplesort, the remaining 900 rows will go into
+ * the tuplestore (assuming uniform distribution). Then in the second step
+ * we sort another 100 rows and the remaining 800 rows will be moved into
+ * a new tuplestore. And so on and so on.
+ *
+ * This means that incrementing the watermarks by single steps may be
+ * quite inefficient, and we need to reflect both the range size and
+ * how much the ranges overlap.
+ *
+ * In fact, maybe we should not determine the step as number of minval
+ * values to skip, but how many ranges would that mean reading. Because
+ * if we have a minval with many duplicates, that may load many rows.
+ * Or even better, we could look at how many rows would that mean loading
+ * into the tuplestore - if we track P(x<minval) for each range (e.g. by
+ * calculating average value during ANALYZE, or perhaps by estimating
+ * it from per-column stats), then we know the increment is going to be
+ * about
+ *
+ *     P(x < minval[i]) - P(x < minval[i-1])
+ *
+ * and we can stop once we'd exceed work_mem (with some slack). See comment
+ * for brin_minmax_stats() for more thoughts.
+ *
+ *
+ * 4) LIMIT/OFFSET vs. full sort
+ *
+ * There's one case where very small sorts may be actually optimal, and
+ * that's queries that need to process only very few rows - say, LIMIT
+ * queries with very small bound.
+ *
+ *
+ * FIXME handling of other brin opclasses (minmax-multi)
+ *
+ * FIXME improve costing
+ *
+ *
+ * Improvement ideas:
+ *
+ * 1) multiple tuplestores for overlapping ranges
+ *
+ * When there are many overlapping ranges (so that maxval > current.maxval),
+ * we're loading all the "future" tuples into a new tuplestore. However, if
+ * there are multiple such ranges (imagine ranges "shifting" by 10%, which
+ * gives us 9 more ranges), we know in the next round we'll only need rows
+ * until the next maxval. We'll not sort these rows, but we'll still shuffle
+ * them around until we get to the proper range (so about 10x each row).
+ * Maybe we should pre-allocate the tuplestores (or maybe even tuplesorts)
+ * for future ranges, and route the tuples to the correct one? Maybe we
+ * could be a bit smarter and discard tuples once we have enough rows for
+ * the preceding ranges (say, with LIMIT queries). We'd also need to worry
+ * about work_mem, though - we can't just use many tuplestores, each with
+ * whole work_mem. So we'd probably use e.g. work_mem/2 for the next one,
+ * and then /4, /8 etc. for the following ones. That's work_mem in total.
+ * And there'd need to be some limit on number of tuplestores, I guess.
+ *
+ * 2) handling NULL values
+ *
+ * We need to handle NULLS FIRST / NULLS LAST cases. The question is how
+ * to do that - the easiest way is to simply do a separate scan of ranges
+ * that might contain NULL values, processing just rows with NULLs, and
+ * discarding other rows. And then process non-NULL values as currently.
+ * The NULL scan would happen before/after this regular phase.
+ *
+ * Byt maybe we could be smarter, and not do separate scans. When reading
+ * a page, we might stash the tuple in a tuplestore, so that we can read
+ * it the next round. Obviously, this might be expensive if we need to
+ * keep too many rows, so the tuplestore would grow too large - in that
+ * case it might be better to just do the two scans.
+ *
+ * 3) parallelism
+ *
+ * Presumably we could do a parallel version of this. The leader or first
+ * worker would prepare the range information, and the workers would then
+ * grab ranges (in a kinda round robin manner), sort them independently,
+ * and then the results would be merged by Gather Merge.
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeBrinSort.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *		ExecBrinSort			scans a relation using an index
+ *		IndexNext				retrieve next tuple using index
+ *		ExecInitBrinSort		creates and initializes state info.
+ *		ExecReScanBrinSort		rescans the indexed relation.
+ *		ExecEndBrinSort			releases all storage.
+ *		ExecBrinSortMarkPos		marks scan position.
+ *		ExecBrinSortRestrPos	restores scan position.
+ *		ExecBrinSortEstimate	estimates DSM space needed for parallel index scan
+ *		ExecBrinSortInitializeDSM initialize DSM for parallel BrinSort
+ *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
+ *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
+ */
+#include "postgres.h"
+
+#include "access/brin.h"
+#include "access/brin_internal.h"
+#include "access/nbtree.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
+#include "executor/execdebug.h"
+#include "executor/nodeBrinSort.h"
+#include "lib/pairingheap.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/array.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+
+static TupleTableSlot *IndexNext(BrinSortState *node);
+static bool IndexRecheck(BrinSortState *node, TupleTableSlot *slot);
+static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
+
+#ifdef DEBUG_BRIN_SORT
+bool debug_brin_sort = false;
+#endif
+
+/* do various consistency checks */
+static void
+AssertCheckRanges(BrinSortState *node)
+{
+#ifdef USE_ASSERT_CHECKING
+
+#endif
+}
+
+/*
+ * brinsort_start_tidscan
+ *		Start scanning tuples from a given page range.
+ *
+ * We open a TID range scan for the given range, and initialize the tuplesort.
+ * Optionally, we update the watermark (with either high/low value). We only
+ * need to do this for the main page range, not for the intersecting ranges.
+ *
+ * XXX Maybe we should initialize the tidscan only once, and then do rescan
+ * for the following ranges? And similarly for the tuplesort?
+ */
+static void
+brinsort_start_tidscan(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate = node->ss.ps.state;
+	BrinRange  *range = node->bs_range;
+
+	/* There must not be any TID scan in progress yet. */
+	Assert(node->ss.ss_currentScanDesc == NULL);
+
+	/* Initialize the TID range scan, for the provided block range. */
+	if (node->ss.ss_currentScanDesc == NULL)
+	{
+		TableScanDesc		tscandesc;
+		ItemPointerData		mintid,
+							maxtid;
+
+		ItemPointerSetBlockNumber(&mintid, range->blkno_start);
+		ItemPointerSetOffsetNumber(&mintid, 0);
+
+		ItemPointerSetBlockNumber(&maxtid, range->blkno_end);
+		ItemPointerSetOffsetNumber(&maxtid, MaxHeapTuplesPerPage);
+
+		elog(DEBUG1, "loading range blocks [%u, %u]",
+			 range->blkno_start, range->blkno_end);
+
+		tscandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 &mintid, &maxtid);
+		node->ss.ss_currentScanDesc = tscandesc;
+	}
+
+	if (node->bs_tuplesortstate == NULL)
+	{
+		TupleDesc	tupDesc = (node->ss.ps.ps_ResultTupleDesc);
+
+		node->bs_tuplesortstate = tuplesort_begin_heap(tupDesc,
+													plan->numCols,
+													plan->sortColIdx,
+													plan->sortOperators,
+													plan->collations,
+													plan->nullsFirst,
+													work_mem,
+													NULL,
+													TUPLESORT_NONE);
+	}
+
+	if (node->bs_tuplestore == NULL)
+	{
+		node->bs_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+	}
+}
+
+/*
+ * brinsort_end_tidscan
+ *		Finish the TID range scan.
+ */
+static void
+brinsort_end_tidscan(BrinSortState *node)
+{
+	/* get the first range, read all tuples using a tid range scan */
+	if (node->ss.ss_currentScanDesc != NULL)
+	{
+		table_endscan(node->ss.ss_currentScanDesc);
+		node->ss.ss_currentScanDesc = NULL;
+	}
+}
+
+/*
+ * brinsort_update_watermark
+ *		Advance the watermark to the next minval (or maxval for DESC).
+ *
+ * We could could actually advance the watermark by multiple steps (not to
+ * the immediately following minval, but a couple more), to accumulate more
+ * rows in the tuplesort. The number of steps we make correlates with the
+ * amount of data we sort in a given step, but we don't know in advance
+ * how many rows (or bytes) will that actually be. We could do some simple
+ * heuristics (measure past sorts and extrapolate).
+ *
+ * XXX With a separate _set and _empty flags, we don't really need to pass
+ * a separate "first" parameter - "set=false" has the same meaning.
+ */
+static void
+brinsort_update_watermark(BrinSortState *node, bool asc)
+{
+	int		cmp;
+	bool	found = false;
+
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool	isnull;
+		Datum	value;
+		bool	all_nulls;
+		bool	not_summarized;
+
+		all_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 4, &isnull));
+		Assert(!isnull);
+
+		not_summarized = DatumGetBool(slot_getattr(node->bs_scan->slot, 5, &isnull));
+		Assert(!isnull);
+
+		/* we ignore ranges that are either all_nulls or not summarized */
+		if (all_nulls || not_summarized)
+			continue;
+
+		/* use either minval or maxval, depending on the ASC / DESC */
+		if (asc)
+			value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		else
+			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		if (!node->bs_watermark_set)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			continue;
+		}
+
+		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
+								  &node->bs_sortsupport);
+
+		if (cmp < 0)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			found = true;
+			break;
+		}
+	}
+
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	node->bs_watermark_empty = (!found);
+}
+
+/*
+ * brinsort_load_tuples
+ *		Load tuples from the TID range scan, add them to tuplesort/store.
+ *
+ * When called for the "current" range, we don't need to check the watermark,
+ * we know the tuple goes into the tuplesort. So with check_watermark we
+ * skip the comparator call to save CPU cost.
+ */
+static void
+brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_processing)
+{
+	BrinSort	   *plan = (BrinSort *) node->ss.ps.plan;
+	TableScanDesc	scan;
+	EState		   *estate;
+	ScanDirection	direction;
+	TupleTableSlot *slot;
+	BrinRange	   *range = node->bs_range;
+	ProjectionInfo *projInfo;
+
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	projInfo = node->bs_ProjInfo;
+
+	slot = node->ss.ss_ScanTupleSlot;
+
+	Assert(node->bs_range != NULL);
+
+	/*
+	 * If we're not processign NULLS, and this is all-nulls range, we can
+	 * just skip it - we won't find any non-NULL tuples in it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (!null_processing && range->all_nulls)
+		return;
+
+	/*
+	 * Similarly, if we're processing NULLs and this range does not have
+	 * has_nulls flag, we can skip it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
+		return;
+
+	brinsort_start_tidscan(node);
+
+	scan = node->ss.ss_currentScanDesc;
+
+	/*
+	 * Read tuples, evaluate the filter (so that we don't keep tuples only to
+	 * discard them later), and decide if it goes into the current range
+	 * (tuplesort) or overflow (tuplestore).
+	 */
+	while (table_scan_getnextslot_tidrange(scan, direction, slot))
+	{
+		ExprContext *econtext;
+		ExprState  *qual;
+
+		/*
+		 * Fetch data from node
+		 */
+		qual = node->bs_qual;
+		econtext = node->ss.ps.ps_ExprContext;
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 *
+		 * XXX Done here, because in ExecScan we'll get different slot type
+		 * (minimal tuple vs. buffered tuple). Scan expects slot while reading
+		 * from the table (like here), but we're stashing it into a tuplesort.
+		 *
+		 * XXX Maybe we could eliminate many tuples by leveraging the BRIN
+		 * range, by executing the consistent function. But we don't have
+		 * the qual in appropriate format at the moment, so we'd preprocess
+		 * the keys similarly to bringetbitmap(). In which case we should
+		 * probably evaluate the stuff while building the ranges? Although,
+		 * if the "consistent" function is expensive, it might be cheaper
+		 * to do that incrementally, as we need the ranges. Would be a win
+		 * for LIMIT queries, for example.
+		 *
+		 * XXX However, maybe we could also leverage other bitmap indexes,
+		 * particularly for BRIN indexes because that makes it simpler to
+		 * eliminate the ranges incrementally - we know which ranges to
+		 * load from the index, while for other indexes (e.g. btree) we
+		 * have to read the whole index and build a bitmap in order to have
+		 * a bitmap for any range. Although, if the condition is very
+		 * selective, we may need to read only a small fraction of the
+		 * index, so maybe that's OK.
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			int		cmp = 0;	/* matters for check_watermark=false */
+			Datum	value;
+			bool	isnull;
+			TupleTableSlot *tmpslot;
+
+			if (projInfo)
+				tmpslot = ExecProject(projInfo);
+			else
+				tmpslot = slot;
+
+			value = slot_getattr(tmpslot, plan->sortColIdx[0], &isnull);
+
+			/*
+			 * Handle NULL values - stash them into the tuplestore, and then
+			 * we'll output them in "process" stage.
+			 *
+			 * XXX Can we be a bit smarter for LIMIT queries and stop reading
+			 * rows once we get the number we need to produce? Probably not,
+			 * because the ordering may reference other columns (which we may
+			 * satisfy through IncrementalSort). But all NULL columns are
+			 * considered equal, so we need all the rows to properly compare
+			 * the other keys.
+			 */
+			if (null_processing)
+			{
+				/* Stash it to the tuplestore (when NULL, or ignore
+				 * it (when not-NULL). */
+				if (isnull)
+					tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+
+				/* NULL or not, we're done */
+				continue;
+			}
+
+			/* we're not processing NULL values, so ignore NULLs */
+			if (isnull)
+				continue;
+
+			/*
+			 * Otherwise compare to watermark, and stash it either to the
+			 * tuplesort or tuplestore.
+			 */
+			if (check_watermark && node->bs_watermark_set && !node->bs_watermark_empty)
+				cmp = ApplySortComparator(value, false,
+										  node->bs_watermark, false,
+										  &node->bs_sortsupport);
+
+			if (cmp <= 0)
+				tuplesort_puttupleslot(node->bs_tuplesortstate, tmpslot);
+			else
+			{
+				/*
+				 * XXX We can be a bit smarter for LIMIT queries - once we
+				 * know we have more rows in the tuplesort than we need to
+				 * output, we can stop spilling - those rows are not going
+				 * to be needed. We can discard the tuplesort (no need to
+				 * respill) and stop spilling.
+				 */
+				tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+			}
+		}
+
+		ExecClearTuple(slot);
+	}
+
+	ExecClearTuple(slot);
+
+	brinsort_end_tidscan(node);
+}
+
+/*
+ * brinsort_load_spill_tuples
+ *		Load tuples from the spill tuplestore, and either stash them into
+ *		a tuplesort or a new tuplestore.
+ *
+ * After processing the last range, we want to process all remaining ranges,
+ * so with check_watermark=false we skip the check.
+ */
+static void
+brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	Tuplestorestate *tupstore;
+	TupleTableSlot *slot;
+	ProjectionInfo *projInfo;
+
+	projInfo = node->bs_ProjInfo;
+
+	if (node->bs_tuplestore == NULL)
+		return;
+
+	/* start scanning the existing tuplestore (XXX needed?) */
+	tuplestore_rescan(node->bs_tuplestore);
+
+	/*
+	 * Create a new tuplestore, for tuples that exceed the watermark and so
+	 * should not be included in the current sort.
+	 */
+	tupstore = tuplestore_begin_heap(false, false, work_mem);
+
+	/*
+	 * We need a slot for minimal tuples. The scan slot uses buffered tuples,
+	 * so it'd trigger an error in the loop.
+	 */
+	if (projInfo)
+		slot = node->ss.ps.ps_ResultTupleSlot;
+	else
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(node->ss.ss_currentRelation),
+									&TTSOpsMinimalTuple);
+
+	while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+	{
+		int		cmp = 0;	/* matters for check_watermark=false */
+		bool	isnull;
+		Datum	value;
+
+		value = slot_getattr(slot, plan->sortColIdx[0], &isnull);
+
+		/* We shouldn't have NULL values in the spill, at least not now. */
+		Assert(!isnull);
+
+		if (check_watermark && node->bs_watermark_set && !node->bs_watermark_empty)
+			cmp = ApplySortComparator(value, false,
+									  node->bs_watermark, false,
+									  &node->bs_sortsupport);
+
+		if (cmp <= 0)
+			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+		else
+		{
+			/*
+			 * XXX We can be a bit smarter for LIMIT queries - once we
+			 * know we have more rows in the tuplesort than we need to
+			 * output, we can stop spilling - those rows are not going
+			 * to be needed. We can discard the tuplesort (no need to
+			 * respill) and stop spilling.
+			 */
+			tuplestore_puttupleslot(tupstore, slot);
+		}
+	}
+
+	/*
+	 * Discard the existing tuplestore (that we just processed), use the new
+	 * one instead.
+	 */
+	tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = tupstore;
+
+	if (!projInfo)
+		ExecDropSingleTupleTableSlot(slot);
+}
+
+static bool
+brinsort_next_range(BrinSortState *node, bool asc)
+{
+	/* FIXME free the current bs_range, if any */
+	node->bs_range = NULL;
+
+	/*
+	 * Mark the position, so that we can restore it in case we reach the
+	 * current watermark.
+	 */
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	/*
+	 * Get the next range and return it, unless we can prove it's the last
+	 * range that can possibly match the current conditon (thanks to how we
+	 * order the ranges).
+	 *
+	 * Also skip ranges that can't possibly match (e.g. because we are in
+	 * NULL processing, and the range has no NULLs).
+	 */
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool		isnull;
+		Datum		value;
+
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = slot_getattr(node->bs_scan->slot, 1, &isnull);
+		range->blkno_end = slot_getattr(node->bs_scan->slot, 2, &isnull);
+		range->has_nulls = slot_getattr(node->bs_scan->slot, 3, &isnull);
+		range->all_nulls = slot_getattr(node->bs_scan->slot, 4, &isnull);
+		range->not_summarized = slot_getattr(node->bs_scan->slot, 5, &isnull);
+		range->min_value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		/*
+		 * Not-summarized ranges match irrespectedly of the watermark (if
+		 * it's set at all).
+		 */
+		if (range->not_summarized)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * The range is summarized, but maybe the watermark is not? That
+		 * would mean we're processing NULL values, so we skip ranges that
+		 * can't possibly match (i.e. with all_nulls=has_nulls=false).
+		 */
+		if (!node->bs_watermark_set)
+		{
+			if (range->all_nulls || range->has_nulls)
+			{
+				node->bs_range = range;
+				return true;
+			}
+
+			/* update the position and try the next range */
+			tuplesort_markpos(node->bs_scan->ranges);
+			pfree(range);
+
+			continue;
+		}
+
+		/*
+		 * Watermark is set, but it's empty - everything matches (except
+		 * for NULL-only ranges, because we're definitely not processing
+		 * NULLS, because then we wouldn't have watermark set).
+		 */
+		if (node->bs_watermark_empty)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * So now we have a summarized range, and we know the watermark
+		 * is set too (so we're not processing NULLs). We place the ranges
+		 * with only nulls last, so once we hit one we're done.
+		 */
+		if (range->all_nulls)
+		{
+			pfree(range);
+			return false;	/* no more matching ranges */
+		}
+
+		/*
+		 * Compare the range to the watermark, using either the minval or
+		 * maxval, depending on ASC/DESC ordering. If the range precedes the
+		 * watermark, return it. Otherwise abort, all the future ranges are
+		 * either not matching the watermark (thanks to ordering) or contain
+		 * only NULL values.
+		 */
+
+		/* use minval or maxval, depending on ASC / DESC */
+		value = (asc) ? range->min_value : range->max_value;
+
+		/*
+		 * compare it to the current watermark (if set)
+		 *
+		 * XXX We don't use (... <= 0) here, because then we'd load ranges
+		 * with that minval (and there might be multiple), but most of the
+		 * rows would go into the tuplestore, because only rows matching the
+		 * minval exactly would be loaded into tuplesort.
+		 */
+		if (ApplySortComparator(value, false,
+								 node->bs_watermark, false,
+								 &node->bs_sortsupport) < 0)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		pfree(range);
+		break;
+	}
+
+	/* not a matching range, we're done */
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	return false;
+}
+
+static bool
+brinsort_range_with_nulls(BrinSortState *node)
+{
+	BrinRange *range = node->bs_range;
+
+	if (range->all_nulls || range->has_nulls || range->not_summarized)
+		return true;
+
+	return false;
+}
+
+static void
+brinsort_rescan(BrinSortState *node)
+{
+	tuplesort_rescan(node->bs_scan->ranges);
+}
+
+/* ----------------------------------------------------------------
+ *		IndexNext
+ *
+ *		Retrieve a tuple from the BrinSort node's currentRelation
+ *		using the index specified in the BrinSortState information.
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+IndexNext(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate;
+	ScanDirection direction;
+	IndexScanDesc scandesc;
+	TupleTableSlot *slot;
+	bool		nullsFirst;
+	bool		asc;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+
+	/* flip direction if this is an overall backward scan */
+	/* XXX For BRIN indexes this is always forward direction */
+	// if (ScanDirectionIsBackward(((BrinSort *) node->ss.ps.plan)->indexorderdir))
+	if (false)
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	scandesc = node->iss_ScanDesc;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	nullsFirst = plan->nullsFirst[0];
+	asc = ScanDirectionIsForward(plan->indexorderdir);
+
+	if (scandesc == NULL)
+	{
+		/*
+		 * We reach here if the index scan is not parallel, or if we're
+		 * serially executing an index scan that was planned to be parallel.
+		 */
+		scandesc = index_beginscan(node->ss.ss_currentRelation,
+								   node->iss_RelationDesc,
+								   estate->es_snapshot,
+								   node->iss_NumScanKeys,
+								   node->iss_NumOrderByKeys);
+
+		node->iss_ScanDesc = scandesc;
+
+		/*
+		 * If no run-time keys to calculate or they are ready, go ahead and
+		 * pass the scankeys to the index AM.
+		 */
+		if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+			index_rescan(scandesc,
+						 node->iss_ScanKeys, node->iss_NumScanKeys,
+						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+
+		/*
+		 * Load info about BRIN ranges, sort them to match the desired ordering.
+		 */
+		ExecInitBrinSortRanges(plan, node);
+		node->bs_phase = BRINSORT_START;
+	}
+
+	/*
+	 * ok, now that we have what we need, fetch the next tuple.
+	 */
+	while (node->bs_phase != BRINSORT_FINISHED)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		elog(DEBUG1, "phase = %d", node->bs_phase);
+
+		AssertCheckRanges(node);
+
+		switch (node->bs_phase)
+		{
+			case BRINSORT_START:
+
+				elog(DEBUG1, "phase = START");
+
+				/*
+				 * If we have NULLS FIRST, move to that stage. Otherwise
+				 * start scanning regular ranges.
+				 */
+				if (nullsFirst)
+					node->bs_phase = BRINSORT_LOAD_NULLS;
+				else
+				{
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+
+					/* set the first watermark */
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_RANGE:
+				{
+					elog(DEBUG1, "phase = LOAD_RANGE");
+
+					/*
+					 * Load tuples matching the new watermark from the existing
+					 * spill tuplestore. We do this before loading tuples from
+					 * the next chunk of ranges, because those will add tuples
+					 * to the spill, and we'd end up processing those twice.
+					 */
+					brinsort_load_spill_tuples(node, true);
+
+					/*
+					 * Load tuples from ranges, until we find a range that has
+					 * min_value >= watermark.
+					 *
+					 * XXX In fact, we are guaranteed to find an exact match
+					 * for the watermark, because of how we pick the watermark.
+					 */
+					while (brinsort_next_range(node, asc))
+						brinsort_load_tuples(node, true, false);
+
+					/*
+					 * If we have loaded any tuples into the tuplesort, try
+					 * sorting it and move to producing the tuples.
+					 *
+					 * XXX The range might have no rows matching the current
+					 * watermark, in which case the tuplesort is empty.
+					 */
+					if (node->bs_tuplesortstate)
+					{
+#ifdef DEBUG_BRIN_SORT
+						tuplesort_reset_stats(node->bs_tuplesortstate);
+#endif
+
+						tuplesort_performsort(node->bs_tuplesortstate);
+
+#ifdef DEBUG_BRIN_SORT
+						if (debug_brin_sort)
+						{
+							TuplesortInstrumentation stats;
+
+							memset(&stats, 0, sizeof(TuplesortInstrumentation));
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							elog(WARNING, "method: %s  space: %ld kB (%s)",
+								 tuplesort_method_name(stats.sortMethod),
+								 stats.spaceUsed,
+								 tuplesort_space_type_name(stats.spaceType));
+						}
+#endif
+					}
+
+					node->bs_phase = BRINSORT_PROCESS_RANGE;
+					break;
+				}
+
+			case BRINSORT_PROCESS_RANGE:
+
+				elog(DEBUG1, "phase BRINSORT_PROCESS_RANGE");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplesortstate != NULL)
+				{
+					if (tuplesort_gettupleslot(node->bs_tuplesortstate,
+										ScanDirectionIsForward(direction),
+										false, slot, NULL))
+						return slot;
+
+					/* once we're done with the tuplesort, reset it */
+					tuplesort_reset(node->bs_tuplesortstate);
+				}
+
+				/*
+				 * Now that we processed tuples from the last range batch,
+				 * see if we reached the end of if we should try updating
+				 * the watermark once again. If the watermark is not set,
+				 * we've already processed the last range.
+				 */
+				if (node->bs_watermark_empty)
+				{
+					if (nullsFirst)
+						node->bs_phase = BRINSORT_FINISHED;
+					else
+					{
+						brinsort_rescan(node);
+						node->bs_phase = BRINSORT_LOAD_NULLS;
+						node->bs_watermark_set = false;
+						node->bs_watermark_empty = false;
+					}
+				}
+				else
+				{
+					/* updte the watermark and try reading more ranges */
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_NULLS:
+				{
+					elog(DEBUG1, "phase = LOAD_NULLS");
+
+					/*
+					 * Try loading another range. If there are no more ranges,
+					 * we're done and we move either to loading regular ranges.
+					 * Otherwise check if this range can contain NULL values.
+					 * If yes, process the range. If not, try loading another
+					 * one from the list.
+					 */
+					while (true)
+					{
+						/* no more ranges - terminate or load regular ranges */
+						if (!brinsort_next_range(node, asc))
+						{
+							if (nullsFirst)
+							{
+								brinsort_rescan(node);
+								node->bs_phase = BRINSORT_LOAD_RANGE;
+								brinsort_update_watermark(node, asc);
+							}
+							else
+								node->bs_phase = BRINSORT_FINISHED;
+
+							break;
+						}
+
+						/* If this range (may) have nulls, proces them */
+						if (brinsort_range_with_nulls(node))
+							break;
+					}
+
+					if (node->bs_range == NULL)
+						break;
+
+					/*
+					 * There should be nothing left in the tuplestore, because
+					 * we flush that at the end of processing regular tuples,
+					 * and we don't retain tuples between NULL ranges.
+					 */
+					// Assert(node->bs_tuplestore == NULL);
+
+					/*
+					 * Load the next unprocessed / NULL range. We don't need to
+					 * check watermark while processing NULLS.
+					 */
+					brinsort_load_tuples(node, false, true);
+
+					node->bs_phase = BRINSORT_PROCESS_NULLS;
+					break;
+				}
+
+				break;
+
+			case BRINSORT_PROCESS_NULLS:
+
+				elog(DEBUG1, "phase = LOAD_NULLS");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				Assert(node->bs_tuplestore != NULL);
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplestore != NULL)
+				{
+
+					while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+						return slot;
+
+					tuplestore_end(node->bs_tuplestore);
+					node->bs_tuplestore = NULL;
+
+					node->bs_phase = BRINSORT_LOAD_NULLS;	/* load next range */
+				}
+
+				break;
+
+			case BRINSORT_FINISHED:
+				elog(ERROR, "unexpected BrinSort phase: FINISHED");
+				break;
+		}
+	}
+
+	/*
+	 * if we get here it means the index scan failed so we are at the end of
+	 * the scan..
+	 */
+	node->iss_ReachedEnd = true;
+	return ExecClearTuple(slot);
+}
+
+/*
+ * IndexRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+IndexRecheck(BrinSortState *node, TupleTableSlot *slot)
+{
+	ExprContext *econtext;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	econtext = node->ss.ps.ps_ExprContext;
+
+	/* Does the tuple meet the indexqual condition? */
+	econtext->ecxt_scantuple = slot;
+	return ExecQualAndReset(node->indexqualorig, econtext);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSort(node)
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecBrinSort(PlanState *pstate)
+{
+	BrinSortState *node = castNode(BrinSortState, pstate);
+
+	/*
+	 * If we have runtime keys and they've not already been set up, do it now.
+	 */
+	if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady)
+		ExecReScan((PlanState *) node);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) IndexNext,
+					(ExecScanRecheckMtd) IndexRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanBrinSort(node)
+ *
+ *		Recalculates the values of any scan keys whose value depends on
+ *		information known at runtime, then rescans the indexed relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanBrinSort(BrinSortState *node)
+{
+	/*
+	 * If we are doing runtime key calculations (ie, any of the index key
+	 * values weren't simple Consts), compute the new key values.  But first,
+	 * reset the context so we don't leak memory as each outer tuple is
+	 * scanned.  Note this assumes that we will recalculate *all* runtime keys
+	 * on each call.
+	 */
+	if (node->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *econtext = node->iss_RuntimeContext;
+
+		ResetExprContext(econtext);
+		ExecIndexEvalRuntimeKeys(econtext,
+								 node->iss_RuntimeKeys,
+								 node->iss_NumRuntimeKeys);
+	}
+	node->iss_RuntimeKeysReady = true;
+
+	/* reset index scan */
+	if (node->iss_ScanDesc)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+	node->iss_ReachedEnd = false;
+
+	ExecScanReScan(&node->ss);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecEndBrinSort
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndBrinSort(BrinSortState *node)
+{
+	Relation	indexRelationDesc;
+	IndexScanDesc IndexScanDesc;
+
+	/*
+	 * extract information from the node
+	 */
+	indexRelationDesc = node->iss_RelationDesc;
+	IndexScanDesc = node->iss_ScanDesc;
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/*
+	 * close the index relation (no-op if we didn't open it)
+	 */
+	if (IndexScanDesc)
+		index_endscan(IndexScanDesc);
+	if (indexRelationDesc)
+		index_close(indexRelationDesc, NoLock);
+
+	if (node->ss.ss_currentScanDesc != NULL)
+		table_endscan(node->ss.ss_currentScanDesc);
+
+	if (node->bs_tuplestore != NULL)
+		tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = NULL;
+
+	if (node->bs_tuplesortstate != NULL)
+		tuplesort_end(node->bs_tuplesortstate);
+	node->bs_tuplesortstate = NULL;
+
+	if (node->bs_scan->ranges != NULL)
+		tuplesort_end(node->bs_scan->ranges);
+	node->bs_scan->ranges = NULL;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortMarkPos
+ *
+ * Note: we assume that no caller attempts to set a mark before having read
+ * at least one tuple.  Otherwise, iss_ScanDesc might still be NULL.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortMarkPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  If a test tuple exists for
+		 * this relation, then we shouldn't access the index at all.  We would
+		 * instead need to save, and later restore, the state of the
+		 * relsubs_done flag, so that re-fetching the test tuple is possible.
+		 * However, given the assumption that no caller sets a mark at the
+		 * start of the scan, we can only get here with relsubs_done[i]
+		 * already set, and so no state need be saved.
+		 */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortMarkPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_markpos(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecIndexRestrPos
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortRestrPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (estate->es_epq_active != NULL)
+	{
+		/* See comments in ExecIndexMarkPos */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortRestrPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_restrpos(node->iss_ScanDesc);
+}
+
+/*
+ * somewhat crippled verson of bringetbitmap
+ *
+ * XXX We don't call consistent function (or any other function), so unlike
+ * bringetbitmap we don't set a separate memory context. If we end up filtering
+ * the ranges somehow (e.g. by WHERE conditions), this might be necessary.
+ *
+ * XXX Should be part of opclass, to somewhere in brin_minmax.c etc.
+ */
+static void
+ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
+{
+	IndexScanDesc	scan = planstate->iss_ScanDesc;
+	Relation	indexRel = planstate->iss_RelationDesc;
+	int			attno;
+	FmgrInfo   *rangeproc;
+	BrinRangeScanDesc *brscan;
+	bool		asc;
+	TargetEntry *tle;
+	int			j;
+	List	   *indexprs = RelationGetIndexExpressions(indexRel);
+
+	/* BRIN Sort only allows ORDER BY using a single column */
+	Assert(node->numCols == 1);
+
+	/*
+	 * Determine index attnum we're interested in. sortColIdx is an index into
+	 * the target list, so we need to grab the expression and try to match it
+	 * to the index. The expression may be either plain Var (in which case we
+	 * match it to indkeys value), or an expression (in which case we match it
+	 * to indexprs).
+	 *
+	 * XXX We've already matched the sort key to the index, otherwise we would
+	 * not get here. So maybe we could just remember it, somehow? Also, we must
+	 * keep the decisions made in these two places consistent - if we fail to
+	 * match a sort key here (which we matched before), we have a problem.
+	 */
+	tle = list_nth(node->scan.plan.targetlist, node->sortColIdx[0] - 1);
+
+	/* find the index key matching the expression from the target entry */
+	j = 0;
+	attno = 0;
+	for (int i = 0; i < indexRel->rd_index->indnatts; i++)
+	{
+		AttrNumber indkey = indexRel->rd_index->indkey.values[i];
+
+		if (AttributeNumberIsValid(indkey))
+		{
+			Var *var = (Var *) tle->expr;
+
+			if (!IsA(tle->expr, Var))
+				continue;
+
+			if (var->varattno == indkey)
+			{
+				attno = (i + 1);
+				break;
+			}
+		}
+		else
+		{
+			Node *expr = (Node *) list_nth(indexprs, j);
+
+			if (equal(expr, tle->expr))
+			{
+				attno = (i + 1);
+				break;
+			}
+
+			j++;
+		}
+	}
+
+	/*
+	 * Make sure we matched the sort key - if not, we should not have got
+	 * to this place at all (try sorting using this index).
+	 */
+	Assert(attno > 0);
+
+	/*
+	 * get procedure to generate sort ranges
+	 *
+	 * FIXME we can't rely on a particular procnum to identify which opclass
+	 * allows building sort ranges, because the optinal procnums are not
+	 * unique (e.g. inclusion_ops have 12 too). So we probably need a flag
+	 * for the opclass.
+	 */
+	rangeproc = index_getprocinfo(indexRel, attno, BRIN_PROCNUM_RANGES);
+
+	/*
+	 * Should not get here without a proc, thanks to the check before
+	 * building the BrinSort path.
+	 */
+	Assert(OidIsValid(rangeproc->fn_oid));
+
+	memset(&planstate->bs_sortsupport, 0, sizeof(SortSupportData));
+
+	planstate->bs_sortsupport.ssup_collation = node->collations[0];
+	planstate->bs_sortsupport.ssup_cxt = CurrentMemoryContext; // FIXME
+
+	PrepareSortSupportFromOrderingOp(node->sortOperators[0], &planstate->bs_sortsupport);
+
+	/*
+	 * Determine if this ASC or DESC sort, so that we can request the
+	 * ranges in the appropriate order (ordered either by minval for
+	 * ASC, or by maxval for DESC).
+	 */
+	asc = ScanDirectionIsForward(node->indexorderdir);
+
+	/*
+	 * Ask the opclass to produce ranges in appropriate ordering.
+	 *
+	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
+	 */
+	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
+											node->collations[0],
+											PointerGetDatum(scan),
+											Int16GetDatum(attno),
+											BoolGetDatum(asc)));
+
+	/* allocate for space, and also for the alternative ordering */
+	planstate->bs_scan = brscan;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitBrinSort
+ *
+ *		Initializes the index scan's state information, creates
+ *		scan keys, and opens the base and index relations.
+ *
+ *		Note: index scans have 2 sets of state information because
+ *			  we have to keep track of the base relation and the
+ *			  index relation.
+ * ----------------------------------------------------------------
+ */
+BrinSortState *
+ExecInitBrinSort(BrinSort *node, EState *estate, int eflags)
+{
+	BrinSortState *indexstate;
+	Relation	currentRelation;
+	LOCKMODE	lockmode;
+
+	/*
+	 * create state structure
+	 */
+	indexstate = makeNode(BrinSortState);
+	indexstate->ss.ps.plan = (Plan *) node;
+	indexstate->ss.ps.state = estate;
+	indexstate->ss.ps.ExecProcNode = ExecBrinSort;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &indexstate->ss.ps);
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	indexstate->ss.ss_currentRelation = currentRelation;
+	indexstate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &indexstate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+	// ExecInitResultTypeTL(&indexstate->ss.ps);
+	// ExecAssignScanProjectionInfo(&indexstate->ss);
+	// ExecInitResultSlot(&indexstate->ss.ps, &TTSOpsVirtual);
+
+	indexstate->bs_ProjInfo = ExecBuildProjectionInfo(((Plan *) node)->targetlist,
+													  indexstate->ss.ps.ps_ExprContext,
+													  indexstate->ss.ps.ps_ResultTupleSlot,
+													  &indexstate->ss.ps,
+													  indexstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor);
+
+	/*
+	 * initialize child expressions
+	 *
+	 * Note: we don't initialize all of the indexqual expression, only the
+	 * sub-parts corresponding to runtime keys (see below).  Likewise for
+	 * indexorderby, if any.  But the indexqualorig expression is always
+	 * initialized even though it will only be used in some uncommon cases ---
+	 * would be nice to improve that.  (Problem is that any SubPlans present
+	 * in the expression must be found now...)
+	 */
+	indexstate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) indexstate);
+	indexstate->indexqualorig =
+		ExecInitQual(node->indexqualorig, (PlanState *) indexstate);
+
+	/*
+	 * If we are just doing EXPLAIN (ie, aren't going to run the plan), stop
+	 * here.  This allows an index-advisor plugin to EXPLAIN a plan containing
+	 * references to nonexistent indexes.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return indexstate;
+
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+
+	/*
+	 * Initialize index-specific scan state
+	 */
+	indexstate->iss_RuntimeKeysReady = false;
+	indexstate->iss_RuntimeKeys = NULL;
+	indexstate->iss_NumRuntimeKeys = 0;
+
+	/*
+	 * build the index scan keys from the index qualification
+	 */
+	ExecIndexBuildScanKeys((PlanState *) indexstate,
+						   indexstate->iss_RelationDesc,
+						   node->indexqual,
+						   false,
+						   &indexstate->iss_ScanKeys,
+						   &indexstate->iss_NumScanKeys,
+						   &indexstate->iss_RuntimeKeys,
+						   &indexstate->iss_NumRuntimeKeys,
+						   NULL,	/* no ArrayKeys */
+						   NULL);
+
+	/*
+	 * If we have runtime keys, we need an ExprContext to evaluate them. The
+	 * node's standard context won't do because we want to reset that context
+	 * for every tuple.  So, build another context just like the other one...
+	 * -tgl 7/11/00
+	 */
+	if (indexstate->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *stdecontext = indexstate->ss.ps.ps_ExprContext;
+
+		ExecAssignExprContext(estate, &indexstate->ss.ps);
+		indexstate->iss_RuntimeContext = indexstate->ss.ps.ps_ExprContext;
+		indexstate->ss.ps.ps_ExprContext = stdecontext;
+	}
+	else
+	{
+		indexstate->iss_RuntimeContext = NULL;
+	}
+
+	indexstate->bs_tuplesortstate = NULL;
+	indexstate->bs_qual = indexstate->ss.ps.qual;
+	indexstate->ss.ps.qual = NULL;
+	// ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+
+	/*
+	 * all done.
+	 */
+	return indexstate;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortEstimate(BrinSortState *node,
+					  ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+
+	node->iss_PscanLen = index_parallelscan_estimate(node->iss_RelationDesc,
+													 estate->es_snapshot);
+	shm_toc_estimate_chunk(&pcxt->estimator, node->iss_PscanLen);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeDSM
+ *
+ *		Set up a parallel index scan descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeDSM(BrinSortState *node,
+						   ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_allocate(pcxt->toc, node->iss_PscanLen);
+	index_parallelscan_initialize(node->ss.ss_currentRelation,
+								  node->iss_RelationDesc,
+								  estate->es_snapshot,
+								  piscan);
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, piscan);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortReInitializeDSM(BrinSortState *node,
+							 ParallelContext *pcxt)
+{
+	index_parallelrescan(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeWorker(BrinSortState *node,
+							  ParallelWorkerContext *pwcxt)
+{
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7918bb6f0db..86f91e6577c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -791,6 +791,260 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	path->path.total_cost = startup_cost + run_cost;
 }
 
+void
+cost_brinsort(BrinSortPath *path, PlannerInfo *root, double loop_count,
+		   bool partial_path)
+{
+	IndexOptInfo *index = path->ipath.indexinfo;
+	RelOptInfo *baserel = index->rel;
+	amcostestimate_function amcostestimate;
+	List	   *qpquals;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_run_cost = 0;
+	Cost		indexStartupCost;
+	Cost		indexTotalCost;
+	Selectivity indexSelectivity;
+	double		indexCorrelation,
+				csquared;
+	double		spc_seq_page_cost,
+				spc_random_page_cost;
+	Cost		min_IO_cost,
+				max_IO_cost;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	double		tuples_fetched;
+	double		pages_fetched;
+	double		rand_heap_pages;
+	double		index_pages;
+
+	/* Should only be applied to base relations */
+	Assert(IsA(baserel, RelOptInfo) &&
+		   IsA(index, IndexOptInfo));
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/*
+	 * Mark the path with the correct row estimate, and identify which quals
+	 * will need to be enforced as qpquals.  We need not check any quals that
+	 * are implied by the index's predicate, so we can use indrestrictinfo not
+	 * baserestrictinfo as the list of relevant restriction clauses for the
+	 * rel.
+	 */
+	if (path->ipath.path.param_info)
+	{
+		path->ipath.path.rows = path->ipath.path.param_info->ppi_rows;
+		/* qpquals come from the rel's restriction clauses and ppi_clauses */
+		qpquals = list_concat(extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+														  path->ipath.indexclauses),
+							  extract_nonindex_conditions(path->ipath.path.param_info->ppi_clauses,
+														  path->ipath.indexclauses));
+	}
+	else
+	{
+		path->ipath.path.rows = baserel->rows;
+		/* qpquals come from just the rel's restriction clauses */
+		qpquals = extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+											  path->ipath.indexclauses);
+	}
+
+	if (!enable_indexscan)
+		startup_cost += disable_cost;
+	/* we don't need to check enable_indexonlyscan; indxpath.c does that */
+
+	/*
+	 * Call index-access-method-specific code to estimate the processing cost
+	 * for scanning the index, as well as the selectivity of the index (ie,
+	 * the fraction of main-table tuples we will have to retrieve) and its
+	 * correlation to the main-table tuple order.  We need a cast here because
+	 * pathnodes.h uses a weak function type to avoid including amapi.h.
+	 */
+	amcostestimate = (amcostestimate_function) index->amcostestimate;
+	amcostestimate(root, &path->ipath, loop_count,
+				   &indexStartupCost, &indexTotalCost,
+				   &indexSelectivity, &indexCorrelation,
+				   &index_pages);
+
+	/*
+	 * Save amcostestimate's results for possible use in bitmap scan planning.
+	 * We don't bother to save indexStartupCost or indexCorrelation, because a
+	 * bitmap scan doesn't care about either.
+	 */
+	path->ipath.indextotalcost = indexTotalCost;
+	path->ipath.indexselectivity = indexSelectivity;
+
+	/* all costs for touching index itself included here */
+	startup_cost += indexStartupCost;
+	run_cost += indexTotalCost - indexStartupCost;
+
+	/* estimate number of main-table tuples fetched */
+	tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);
+
+	/* fetch estimated page costs for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/*----------
+	 * Estimate number of main-table pages fetched, and compute I/O cost.
+	 *
+	 * When the index ordering is uncorrelated with the table ordering,
+	 * we use an approximation proposed by Mackert and Lohman (see
+	 * index_pages_fetched() for details) to compute the number of pages
+	 * fetched, and then charge spc_random_page_cost per page fetched.
+	 *
+	 * When the index ordering is exactly correlated with the table ordering
+	 * (just after a CLUSTER, for example), the number of pages fetched should
+	 * be exactly selectivity * table_size.  What's more, all but the first
+	 * will be sequential fetches, not the random fetches that occur in the
+	 * uncorrelated case.  So if the number of pages is more than 1, we
+	 * ought to charge
+	 *		spc_random_page_cost + (pages_fetched - 1) * spc_seq_page_cost
+	 * For partially-correlated indexes, we ought to charge somewhere between
+	 * these two estimates.  We currently interpolate linearly between the
+	 * estimates based on the correlation squared (XXX is that appropriate?).
+	 *
+	 * If it's an index-only scan, then we will not need to fetch any heap
+	 * pages for which the visibility map shows all tuples are visible.
+	 * Hence, reduce the estimated number of heap fetches accordingly.
+	 * We use the measured fraction of the entire heap that is all-visible,
+	 * which might not be particularly relevant to the subset of the heap
+	 * that this query will fetch; but it's not clear how to do better.
+	 *----------
+	 */
+	if (loop_count > 1)
+	{
+		/*
+		 * For repeated indexscans, the appropriate estimate for the
+		 * uncorrelated case is to scale up the number of tuples fetched in
+		 * the Mackert and Lohman formula by the number of scans, so that we
+		 * estimate the number of pages fetched by all the scans; then
+		 * pro-rate the costs for one scan.  In this case we assume all the
+		 * fetches are random accesses.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+
+		/*
+		 * In the perfectly correlated case, the number of pages touched by
+		 * each scan is selectivity * table_size, and we can use the Mackert
+		 * and Lohman formula at the page level to estimate how much work is
+		 * saved by caching across scans.  We still assume all the fetches are
+		 * random, though, which is an overestimate that's hard to correct for
+		 * without double-counting the cache effects.  (But in most cases
+		 * where such a plan is actually interesting, only one page would get
+		 * fetched per scan anyway, so it shouldn't matter much.)
+		 */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		pages_fetched = index_pages_fetched(pages_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		min_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+	}
+	else
+	{
+		/*
+		 * Normal case: apply the Mackert and Lohman formula, and then
+		 * interpolate between that and the correlation-derived result.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		/* max_IO_cost is for the perfectly uncorrelated case (csquared=0) */
+		max_IO_cost = pages_fetched * spc_random_page_cost;
+
+		/* min_IO_cost is for the perfectly correlated case (csquared=1) */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		if (pages_fetched > 0)
+		{
+			min_IO_cost = spc_random_page_cost;
+			if (pages_fetched > 1)
+				min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
+		}
+		else
+			min_IO_cost = 0;
+	}
+
+	if (partial_path)
+	{
+		/*
+		 * Estimate the number of parallel workers required to scan index. Use
+		 * the number of heap pages computed considering heap fetches won't be
+		 * sequential as for parallel scans the pages are accessed in random
+		 * order.
+		 */
+		path->ipath.path.parallel_workers = compute_parallel_worker(baserel,
+															  rand_heap_pages,
+															  index_pages,
+															  max_parallel_workers_per_gather);
+
+		/*
+		 * Fall out if workers can't be assigned for parallel scan, because in
+		 * such a case this path will be rejected.  So there is no benefit in
+		 * doing extra computation.
+		 */
+		if (path->ipath.path.parallel_workers <= 0)
+			return;
+
+		path->ipath.path.parallel_aware = true;
+	}
+
+	/*
+	 * Now interpolate based on estimated index order correlation to get total
+	 * disk I/O cost for main table accesses.
+	 */
+	csquared = indexCorrelation * indexCorrelation;
+
+	run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
+
+	/*
+	 * Estimate CPU costs per tuple.
+	 *
+	 * What we want here is cpu_tuple_cost plus the evaluation costs of any
+	 * qual clauses that we have to evaluate as qpquals.
+	 */
+	cost_qual_eval(&qpqual_cost, qpquals, root);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+
+	cpu_run_cost += cpu_per_tuple * tuples_fetched;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->ipath.path.pathtarget->cost.startup;
+	cpu_run_cost += path->ipath.path.pathtarget->cost.per_tuple * path->ipath.path.rows;
+
+	/* Adjust costing for parallelism, if used. */
+	if (path->ipath.path.parallel_workers > 0)
+	{
+		double		parallel_divisor = get_parallel_divisor(&path->ipath.path);
+
+		path->ipath.path.rows = clamp_row_est(path->ipath.path.rows / parallel_divisor);
+
+		/* The CPU cost is divided among all the workers. */
+		cpu_run_cost /= parallel_divisor;
+	}
+
+	run_cost += cpu_run_cost;
+
+	path->ipath.path.startup_cost = startup_cost;
+	path->ipath.path.total_cost = startup_cost + run_cost;
+}
+
 /*
  * extract_nonindex_conditions
  *
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 721a0752018..132718fb736 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -17,12 +17,16 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
+#include "access/relation.h"
 #include "access/stratnum.h"
 #include "access/sysattr.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
@@ -32,10 +36,13 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/rel.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
 
+bool		enable_brinsort = true;
+
 /* XXX see PartCollMatchesExprColl */
 #define IndexCollMatchesExprColl(idxcollation, exprcollation) \
 	((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
@@ -1103,6 +1110,182 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	/*
+	 * If this is a BRIN index with suitable opclass (minmax or such), we may
+	 * try doing BRIN sort. BRIN indexes are not ordered and amcanorderbyop
+	 * is set to false, so we probably will need some new opclass flag to
+	 * mark indexes that support this.
+	 */
+	if (enable_brinsort && pathkeys_possibly_useful)
+	{
+		ListCell *lc;
+		Relation rel2 = relation_open(index->indexoid, NoLock);
+		int		 idx;
+
+		/*
+		 * Try generating sorted paths for each key with the right opclass.
+		 */
+		idx = -1;
+		foreach(lc, index->indextlist)
+		{
+			TargetEntry	   *indextle = (TargetEntry *) lfirst(lc);
+			BrinSortPath   *bpath;
+			Oid				rangeproc;
+			AttrNumber		attnum;
+
+			idx++;
+			attnum = (idx + 1);
+
+
+			/* XXX ignore non-BRIN indexes */
+			if (rel2->rd_rel->relam != BRIN_AM_OID)
+				continue;
+
+			/*
+			 * XXX Ignore keys not using an opclass with the "ranges" proc.
+			 * For now we only do this for some minmax opclasses, but adding
+			 * it to all minmax is simple, and adding it to minmax-multi
+			 * should not be very hard.
+			 */
+			rangeproc = index_getprocid(rel2, attnum, BRIN_PROCNUM_RANGES);
+			if (!OidIsValid(rangeproc))
+				continue;
+
+			/*
+			 * XXX stuff extracted from build_index_pathkeys, except that we
+			 * only deal with a single index key (producing a single pathkey),
+			 * so we only sort on a single column. I guess we could use more
+			 * index keys and sort on more expressions? Would that mean these
+			 * keys need to be rather well correlated? In any case, it seems
+			 * rather complex to implement, so I leave it as a possible
+			 * future improvement.
+			 *
+			 * XXX This could also use the other BRIN keys (even from other
+			 * indexes) in a different way - we might use the other ranges
+			 * to quickly eliminate some of the chunks, essentially like a
+			 * bitmap, but maybe without using the bitmap. Or we might use
+			 * other indexes through bitmaps.
+			 *
+			 * XXX This fakes a number of parameters, because we don't store
+			 * the btree opclass in the index, instead we use the default
+			 * one for the key data type. And BRIN does not allow specifying
+			 *
+			 * XXX We don't add the path to result, because this function is
+			 * supposed to generate IndexPaths. Instead, we just add the path
+			 * using add_path(). We should be building this in a different
+			 * place, perhaps in create_index_paths() or so.
+			 *
+			 * XXX By building it elsewhere, we could also leverage the index
+			 * paths we've built here, particularly the bitmap index paths,
+			 * which we could use to eliminate many of the ranges.
+			 *
+			 * XXX We don't have any explicit ordering associated with the
+			 * BRIN index, e.g. we don't have ASC/DESC and NULLS FIRST/LAST.
+			 * So this is not encoded in the index, and we can satisfy all
+			 * these cases - but we need to add paths for each combination.
+			 * I wonder if there's a better way to do this.
+			 */
+
+			/* ASC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* ASC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+		}
+
+		relation_close(rel2, NoLock);
+	}
+
 	return result;
 }
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index c4e7f97f687..10ea23a501b 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -27,6 +27,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/lsyscache.h"
+#include "utils/typcache.h"
 
 
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
@@ -622,6 +623,54 @@ build_index_pathkeys(PlannerInfo *root,
 	return retval;
 }
 
+
+List *
+build_index_pathkeys_brin(PlannerInfo *root,
+						  IndexOptInfo *index,
+						  TargetEntry  *tle,
+						  int idx,
+						  bool reverse_sort,
+						  bool nulls_first)
+{
+	TypeCacheEntry *typcache;
+	PathKey		   *cpathkey;
+	Oid				sortopfamily;
+
+	/*
+	 * Get default btree opfamily for the type, extracted from the
+	 * entry in index targetlist.
+	 *
+	 * XXX Is there a better / more correct way to do this?
+	 */
+	typcache = lookup_type_cache(exprType((Node *) tle->expr),
+								 TYPECACHE_BTREE_OPFAMILY);
+	sortopfamily = typcache->btree_opf;
+
+	/*
+	 * OK, try to make a canonical pathkey for this sort key.  Note we're
+	 * underneath any outer joins, so nullable_relids should be NULL.
+	 */
+	cpathkey = make_pathkey_from_sortinfo(root,
+										  tle->expr,
+										  sortopfamily,
+										  index->opcintype[idx],
+										  index->indexcollations[idx],
+										  reverse_sort,
+										  nulls_first,
+										  0,
+										  index->rel->relids,
+										  false);
+
+	/*
+	 * There may be no pathkey if we haven't matched any sortkey, in which
+	 * case ignore it.
+	 */
+	if (!cpathkey)
+		return NIL;
+
+	return list_make1(cpathkey);
+}
+
 /*
  * partkey_is_bool_constant_for_query
  *
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 134130476e4..78837928d9a 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -124,6 +124,8 @@ static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
 										  List *tlist, List *scan_clauses);
 static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
 								   List *tlist, List *scan_clauses, bool indexonly);
+static BrinSort *create_brinsort_plan(PlannerInfo *root, BrinSortPath *best_path,
+									  List *tlist, List *scan_clauses);
 static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
 											   BitmapHeapPath *best_path,
 											   List *tlist, List *scan_clauses);
@@ -191,6 +193,9 @@ static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 List *indexorderby,
 										 List *indextlist,
 										 ScanDirection indexscandir);
+static BrinSort *make_brinsort(List *qptlist, List *qpqual, Index scanrelid,
+							   Oid indexid, List *indexqual, List *indexqualorig,
+							   ScanDirection indexscandir);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -410,6 +415,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_CustomScan:
 			plan = create_scan_plan(root, best_path, flags);
 			break;
+		case T_BrinSort:
+			plan = create_scan_plan(root, best_path, flags);
+			break;
 		case T_HashJoin:
 		case T_MergeJoin:
 		case T_NestLoop:
@@ -776,6 +784,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												   scan_clauses);
 			break;
 
+		case T_BrinSort:
+			plan = (Plan *) create_brinsort_plan(root,
+												 (BrinSortPath *) best_path,
+												 tlist,
+												 scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -3185,6 +3200,154 @@ create_indexscan_plan(PlannerInfo *root,
 	return scan_plan;
 }
 
+/*
+ * create_brinsort_plan
+ *	  Returns a brinsort plan for the base relation scanned by 'best_path'
+ *	  with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ *
+ * This is mostly a slighly simplified version of create_indexscan_plan, with
+ * the unecessary parts removed (we don't support indexonly scans, or reordering
+ * and similar stuff).
+ */
+static BrinSort *
+create_brinsort_plan(PlannerInfo *root,
+					 BrinSortPath *best_path,
+					 List *tlist,
+					 List *scan_clauses)
+{
+	BrinSort   *brinsort_plan;
+	List	   *indexclauses = best_path->ipath.indexclauses;
+	Index		baserelid = best_path->ipath.path.parent->relid;
+	IndexOptInfo *indexinfo = best_path->ipath.indexinfo;
+	Oid			indexoid = indexinfo->indexoid;
+	List	   *qpqual;
+	List	   *stripped_indexquals;
+	List	   *fixed_indexquals;
+	ListCell   *l;
+
+	List	   *pathkeys = best_path->ipath.path.pathkeys;
+
+	/* it should be a base rel... */
+	Assert(baserelid > 0);
+	Assert(best_path->ipath.path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * Extract the index qual expressions (stripped of RestrictInfos) from the
+	 * IndexClauses list, and prepare a copy with index Vars substituted for
+	 * table Vars.  (This step also does replace_nestloop_params on the
+	 * fixed_indexquals.)
+	 */
+	fix_indexqual_references(root, &best_path->ipath,
+							 &stripped_indexquals,
+							 &fixed_indexquals);
+
+	/*
+	 * The qpqual list must contain all restrictions not automatically handled
+	 * by the index, other than pseudoconstant clauses which will be handled
+	 * by a separate gating plan node.  All the predicates in the indexquals
+	 * will be checked (either by the index itself, or by nodeIndexscan.c),
+	 * but if there are any "special" operators involved then they must be
+	 * included in qpqual.  The upshot is that qpqual must contain
+	 * scan_clauses minus whatever appears in indexquals.
+	 *
+	 * is_redundant_with_indexclauses() detects cases where a scan clause is
+	 * present in the indexclauses list or is generated from the same
+	 * EquivalenceClass as some indexclause, and is therefore redundant with
+	 * it, though not equal.  (The latter happens when indxpath.c prefers a
+	 * different derived equality than what generate_join_implied_equalities
+	 * picked for a parameterized scan's ppi_clauses.)  Note that it will not
+	 * match to lossy index clauses, which is critical because we have to
+	 * include the original clause in qpqual in that case.
+	 *
+	 * In some situations (particularly with OR'd index conditions) we may
+	 * have scan_clauses that are not equal to, but are logically implied by,
+	 * the index quals; so we also try a predicate_implied_by() check to see
+	 * if we can discard quals that way.  (predicate_implied_by assumes its
+	 * first input contains only immutable functions, so we have to check
+	 * that.)
+	 *
+	 * Note: if you change this bit of code you should also look at
+	 * extract_nonindex_conditions() in costsize.c.
+	 */
+	qpqual = NIL;
+	foreach(l, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (rinfo->pseudoconstant)
+			continue;			/* we may drop pseudoconstants here */
+		if (is_redundant_with_indexclauses(rinfo, indexclauses))
+			continue;			/* dup or derived from same EquivalenceClass */
+		if (!contain_mutable_functions((Node *) rinfo->clause) &&
+			predicate_implied_by(list_make1(rinfo->clause), stripped_indexquals,
+								 false))
+			continue;			/* provably implied by indexquals */
+		qpqual = lappend(qpqual, rinfo);
+	}
+
+	/* Sort clauses into best execution order */
+	qpqual = order_qual_clauses(root, qpqual);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	qpqual = extract_actual_clauses(qpqual, false);
+
+	/*
+	 * We have to replace any outer-relation variables with nestloop params in
+	 * the indexqualorig, qpqual, and indexorderbyorig expressions.  A bit
+	 * annoying to have to do this separately from the processing in
+	 * fix_indexqual_references --- rethink this when generalizing the inner
+	 * indexscan support.  But note we can't really do this earlier because
+	 * it'd break the comparisons to predicates above ... (or would it?  Those
+	 * wouldn't have outer refs)
+	 */
+	if (best_path->ipath.path.param_info)
+	{
+		stripped_indexquals = (List *)
+			replace_nestloop_params(root, (Node *) stripped_indexquals);
+		qpqual = (List *)
+			replace_nestloop_params(root, (Node *) qpqual);
+	}
+
+	/* Finally ready to build the plan node */
+	brinsort_plan = make_brinsort(tlist,
+								  qpqual,
+								  baserelid,
+								  indexoid,
+								  fixed_indexquals,
+								  stripped_indexquals,
+								  best_path->ipath.indexscandir);
+
+	if (pathkeys != NIL)
+	{
+		/*
+		 * Compute sort column info, and adjust the Append's tlist as needed.
+		 * Because we pass adjust_tlist_in_place = true, we may ignore the
+		 * function result; it must be the same plan node.  However, we then
+		 * need to detect whether any tlist entries were added.
+		 */
+		(void) prepare_sort_from_pathkeys((Plan *) brinsort_plan, pathkeys,
+										  best_path->ipath.path.parent->relids,
+										  NULL,
+										  true,
+										  &brinsort_plan->numCols,
+										  &brinsort_plan->sortColIdx,
+										  &brinsort_plan->sortOperators,
+										  &brinsort_plan->collations,
+										  &brinsort_plan->nullsFirst);
+		//tlist_was_changed = (orig_tlist_length != list_length(plan->plan.targetlist));
+		for (int i = 0; i < brinsort_plan->numCols; i++)
+			elog(DEBUG1, "%d => %d %d %d %d", i,
+				 brinsort_plan->sortColIdx[i],
+				 brinsort_plan->sortOperators[i],
+				 brinsort_plan->collations[i],
+				 brinsort_plan->nullsFirst[i]);
+	}
+
+	copy_generic_path_info(&brinsort_plan->scan.plan, &best_path->ipath.path);
+
+	return brinsort_plan;
+}
+
 /*
  * create_bitmap_scan_plan
  *	  Returns a bitmap scan plan for the base relation scanned by 'best_path'
@@ -5539,6 +5702,31 @@ make_indexscan(List *qptlist,
 	return node;
 }
 
+static BrinSort *
+make_brinsort(List *qptlist,
+			   List *qpqual,
+			   Index scanrelid,
+			   Oid indexid,
+			   List *indexqual,
+			   List *indexqualorig,
+			   ScanDirection indexscandir)
+{
+	BrinSort  *node = makeNode(BrinSort);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->indexid = indexid;
+	node->indexqual = indexqual;
+	node->indexqualorig = indexqualorig;
+	node->indexorderdir = indexscandir;
+
+	return node;
+}
+
 static IndexOnlyScan *
 make_indexonlyscan(List *qptlist,
 				   List *qpqual,
@@ -7175,6 +7363,7 @@ is_projection_capable_path(Path *path)
 		case T_Memoize:
 		case T_Sort:
 		case T_IncrementalSort:
+		case T_BrinSort:
 		case T_Unique:
 		case T_SetOp:
 		case T_LockRows:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5cc8366af66..ce7cbce02a3 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -708,6 +708,25 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 				return set_indexonlyscan_references(root, splan, rtoffset);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *splan = (BrinSort *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->indexqual =
+					fix_scan_list(root, splan->indexqual,
+								  rtoffset, 1);
+				splan->indexqualorig =
+					fix_scan_list(root, splan->indexqualorig,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *splan = (BitmapIndexScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d749b505785..478d56234a4 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1028,6 +1028,63 @@ create_index_path(PlannerInfo *root,
 	return pathnode;
 }
 
+
+/*
+ * create_brinsort_path
+ *	  Creates a path node for sorted brin sort scan.
+ *
+ * 'index' is a usable index.
+ * 'indexclauses' is a list of IndexClause nodes representing clauses
+ *			to be enforced as qual conditions in the scan.
+ * 'indexorderbys' is a list of bare expressions (no RestrictInfos)
+ *			to be used as index ordering operators in the scan.
+ * 'indexorderbycols' is an integer list of index column numbers (zero based)
+ *			the ordering operators can be used with.
+ * 'pathkeys' describes the ordering of the path.
+ * 'indexscandir' is ForwardScanDirection or BackwardScanDirection
+ *			for an ordered index, or NoMovementScanDirection for
+ *			an unordered index.
+ * 'indexonly' is true if an index-only scan is wanted.
+ * 'required_outer' is the set of outer relids for a parameterized path.
+ * 'loop_count' is the number of repetitions of the indexscan to factor into
+ *		estimates of caching behavior.
+ * 'partial_path' is true if constructing a parallel index scan path.
+ *
+ * Returns the new path node.
+ */
+BrinSortPath *
+create_brinsort_path(PlannerInfo *root,
+					 IndexOptInfo *index,
+					 List *indexclauses,
+					 List *pathkeys,
+					 ScanDirection indexscandir,
+					 bool indexonly,
+					 Relids required_outer,
+					 double loop_count,
+					 bool partial_path)
+{
+	BrinSortPath  *pathnode = makeNode(BrinSortPath);
+	RelOptInfo *rel = index->rel;
+
+	pathnode->ipath.path.pathtype = T_BrinSort;
+	pathnode->ipath.path.parent = rel;
+	pathnode->ipath.path.pathtarget = rel->reltarget;
+	pathnode->ipath.path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->ipath.path.parallel_aware = false;
+	pathnode->ipath.path.parallel_safe = rel->consider_parallel;
+	pathnode->ipath.path.parallel_workers = 0;
+	pathnode->ipath.path.pathkeys = pathkeys;
+
+	pathnode->ipath.indexinfo = index;
+	pathnode->ipath.indexclauses = indexclauses;
+	pathnode->ipath.indexscandir = indexscandir;
+
+	cost_brinsort(pathnode, root, loop_count, partial_path);
+
+	return pathnode;
+}
+
 /*
  * create_bitmap_heap_path
  *	  Creates a path node for a bitmap scan.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1d576343ecd..775d220fce9 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -101,6 +101,10 @@ extern bool debug_brin_stats;
 extern bool debug_brin_cross_check;
 #endif
 
+#ifdef DEBUG_BRIN_SORT
+extern bool debug_brin_sort;
+#endif
+
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
 #endif
@@ -1017,6 +1021,16 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_brinsort", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of BRIN sort plans."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_brinsort,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
@@ -1258,6 +1272,20 @@ struct config_bool ConfigureNamesBool[] =
 	},
 #endif
 
+#ifdef DEBUG_BRIN_SORT
+	/* this is undocumented because not exposed in a standard build */
+	{
+		{"debug_brin_sort", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Print info about BRIN sorting."),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&debug_brin_sort,
+		false,
+		NULL, NULL, NULL
+	},
+#endif
+
 	{
 		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Terminate session on any error."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 47e80ad150c..10b97e96bc8 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -371,6 +371,7 @@
 
 #enable_async_append = on
 #enable_bitmapscan = on
+#enable_brinsort = off
 #enable_gathermerge = on
 #enable_hashagg = on
 #enable_hashjoin = on
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 9ca9835aab6..750ee016fd7 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2574,6 +2574,18 @@ tuplesort_get_stats(Tuplesortstate *state,
 	}
 }
 
+/*
+ * tuplesort_reset_stats - reset summary statistics
+ *
+ * This can be called before tuplesort_performsort() starts.
+ */
+void
+tuplesort_reset_stats(Tuplesortstate *state)
+{
+	state->isMaxSpaceDisk = false;
+	state->maxSpace = 0;
+}
+
 /*
  * Convert TuplesortMethod to a string.
  */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index 1d21b816fcd..cdfa7421ae9 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,41 +34,6 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
-/*
- * Info about ranges for BRIN Sort.
- */
-typedef struct BrinRange
-{
-	BlockNumber blkno_start;
-	BlockNumber blkno_end;
-
-	Datum	min_value;
-	Datum	max_value;
-	bool	has_nulls;
-	bool	all_nulls;
-	bool	not_summarized;
-
-	/*
-	 * Index of the range when ordered by min_value (if there are multiple
-	 * ranges with the same min_value, it's the lowest one).
-	 */
-	uint32	min_index;
-
-	/*
-	 * Minimum min_index from all ranges with higher max_value (i.e. when
-	 * sorted by max_value). If there are multiple ranges with the same
-	 * max_value, it depends on the ordering (i.e. the ranges may get
-	 * different min_index_lowest, depending on the exact ordering).
-	 */
-	uint32	min_index_lowest;
-} BrinRange;
-
-typedef struct BrinRanges
-{
-	int			nranges;
-	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
-} BrinRanges;
-
 typedef struct BrinMinmaxStats
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index eac796e6f47..6dd3f3e3c11 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -76,6 +76,7 @@ typedef struct BrinDesc
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
 #define BRIN_PROCNUM_STATISTICS		11	/* optional */
+#define BRIN_PROCNUM_RANGES 		12	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 9bbd1f14f12..b499699b350 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -806,6 +806,8 @@
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -839,6 +841,8 @@
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -870,6 +874,8 @@
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -901,6 +907,8 @@
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -915,6 +923,8 @@
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -929,6 +939,8 @@
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1048,6 +1060,8 @@
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1078,6 +1092,8 @@
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1128,6 +1144,8 @@
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1181,6 +1199,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1197,6 +1218,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1288,6 +1312,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1344,6 +1371,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1398,6 +1428,8 @@
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1471,6 +1503,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1504,6 +1539,8 @@
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1557,6 +1594,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1573,6 +1613,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1587,6 +1630,8 @@
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1716,6 +1761,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1772,6 +1820,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1824,6 +1875,8 @@
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1841,6 +1894,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1858,6 +1914,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1912,6 +1971,8 @@
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1988,6 +2049,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c44784a0d07..84ec3259be9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5309,7 +5309,7 @@
   proname => 'pg_stat_get_numscans', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_numscans' },
-{ oid => '9976', descr => 'statistics: time of the last scan for table/index',
+{ oid => '8912', descr => 'statistics: time of the last scan for table/index',
   proname => 'pg_stat_get_lastscan', provolatile => 's', proparallel => 'r',
   prorettype => 'timestamptz', proargtypes => 'oid',
   prosrc => 'pg_stat_get_lastscan' },
@@ -8500,6 +8500,9 @@
   proname => 'brin_minmax_stats', prorettype => 'bool',
   proargtypes => 'internal internal int2 int2 internal int4',
   prosrc => 'brin_minmax_stats' },
+{ oid => '9801', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_ranges', prorettype => 'bool',
+  proargtypes => 'internal int2 bool', prosrc => 'brin_minmax_ranges' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/executor/nodeBrinSort.h b/src/include/executor/nodeBrinSort.h
new file mode 100644
index 00000000000..3cac599d811
--- /dev/null
+++ b/src/include/executor/nodeBrinSort.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeBrinSort.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODEBRIN_SORT_H
+#define NODEBRIN_SORT_H
+
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "nodes/execnodes.h"
+
+extern BrinSortState *ExecInitBrinSort(BrinSort *node, EState *estate, int eflags);
+extern void ExecEndBrinSort(BrinSortState *node);
+extern void ExecBrinSortMarkPos(BrinSortState *node);
+extern void ExecBrinSortRestrPos(BrinSortState *node);
+extern void ExecReScanBrinSort(BrinSortState *node);
+extern void ExecBrinSortEstimate(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortReInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeWorker(BrinSortState *node,
+										  ParallelWorkerContext *pwcxt);
+
+/*
+ * These routines are exported to share code with nodeIndexonlyscan.c and
+ * nodeBitmapBrinSort.c
+ */
+extern void ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
+								   List *quals, bool isorderby,
+								   ScanKey *scanKeys, int *numScanKeys,
+								   IndexRuntimeKeyInfo **runtimeKeys, int *numRuntimeKeys,
+								   IndexArrayKeyInfo **arrayKeys, int *numArrayKeys);
+extern void ExecIndexEvalRuntimeKeys(ExprContext *econtext,
+									 IndexRuntimeKeyInfo *runtimeKeys, int numRuntimeKeys);
+extern bool ExecIndexEvalArrayKeys(ExprContext *econtext,
+								   IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+extern bool ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+
+#endif							/* NODEBRIN_SORT_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 20f4c8b35f3..efe26938d0a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1565,6 +1565,114 @@ typedef struct IndexScanState
 	Size		iss_PscanLen;
 } IndexScanState;
 
+typedef enum {
+	BRINSORT_START,
+	BRINSORT_LOAD_RANGE,
+	BRINSORT_PROCESS_RANGE,
+	BRINSORT_LOAD_NULLS,
+	BRINSORT_PROCESS_NULLS,
+	BRINSORT_FINISHED
+} BrinSortPhase;
+
+typedef struct BrinRangeScanDesc
+{
+	/* range info tuple descriptor */
+	TupleDesc		tdesc;
+
+	/* ranges, sorted by minval, blkno_start */
+	Tuplesortstate *ranges;
+
+	/* number of ranges in the tuplesort */
+	int64			nranges;
+
+	/* distinct minval (sorted) */
+	Tuplestorestate *minvals;
+
+	/* slot for accessing the tuplesort/tuplestore */
+	TupleTableSlot  *slot;
+
+} BrinRangeScanDesc;
+
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinSortState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	ExprState  *indexqualorig;
+	List	   *indexorderbyorig;
+	struct ScanKeyData *iss_ScanKeys;
+	int			iss_NumScanKeys;
+	struct ScanKeyData *iss_OrderByKeys;
+	int			iss_NumOrderByKeys;
+	IndexRuntimeKeyInfo *iss_RuntimeKeys;
+	int			iss_NumRuntimeKeys;
+	bool		iss_RuntimeKeysReady;
+	ExprContext *iss_RuntimeContext;
+	Relation	iss_RelationDesc;
+	struct IndexScanDescData *iss_ScanDesc;
+
+	/* These are needed for re-checking ORDER BY expr ordering */
+	pairingheap *iss_ReorderQueue;
+	bool		iss_ReachedEnd;
+	Datum	   *iss_OrderByValues;
+	bool	   *iss_OrderByNulls;
+	SortSupport iss_SortSupport;
+	bool	   *iss_OrderByTypByVals;
+	int16	   *iss_OrderByTypLens;
+	Size		iss_PscanLen;
+
+	/* */
+	BrinRangeScanDesc *bs_scan;
+	BrinRange	   *bs_range;
+	ExprState	   *bs_qual;
+	Datum			bs_watermark;
+	bool			bs_watermark_set;
+	bool			bs_watermark_empty;
+	BrinSortPhase	bs_phase;
+	SortSupportData	bs_sortsupport;
+	ProjectionInfo *bs_ProjInfo;
+
+	/*
+	 * We need two tuplesort instances - one for current range, one for
+	 * spill-over tuples from the overlapping ranges
+	 */
+	void		   *bs_tuplesortstate;
+	Tuplestorestate *bs_tuplestore;
+} BrinSortState;
+
 /* ----------------
  *	 IndexOnlyScanState information
  *
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d61a62da196..3d93b2ac714 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1690,6 +1690,17 @@ typedef struct IndexPath
 	Selectivity indexselectivity;
 } IndexPath;
 
+/*
+ * read sorted data from brin index
+ *
+ * We use IndexPath, because that's what amcostestimate is expecting, but
+ * we typedef it as a separate struct.
+ */
+typedef struct BrinSortPath
+{
+	IndexPath	ipath;
+} BrinSortPath;
+
 /*
  * Each IndexClause references a RestrictInfo node from the query's WHERE
  * or JOIN conditions, and shows how that restriction can be applied to
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 659bd05c0c1..13dba2b10b8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -501,6 +501,32 @@ typedef struct IndexOnlyScan
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
 } IndexOnlyScan;
 
+
+typedef struct BrinSort
+{
+	Scan		scan;
+	Oid			indexid;		/* OID of index to scan */
+	List	   *indexqual;		/* list of index quals (usually OpExprs) */
+	List	   *indexqualorig;	/* the same in original form */
+	ScanDirection indexorderdir;	/* forward or backward or don't care */
+
+	/* number of sort-key columns */
+	int			numCols;
+
+	/* their indexes in the target list */
+	AttrNumber *sortColIdx pg_node_attr(array_size(numCols));
+
+	/* OIDs of operators to sort them by */
+	Oid		   *sortOperators pg_node_attr(array_size(numCols));
+
+	/* OIDs of collations */
+	Oid		   *collations pg_node_attr(array_size(numCols));
+
+	/* NULLS FIRST/LAST directions */
+	bool	   *nullsFirst pg_node_attr(array_size(numCols));
+
+} BrinSort;
+
 /* ----------------
  *		bitmap index scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6cf49705d3a..1fed43645a7 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -70,6 +70,7 @@ extern PGDLLIMPORT bool enable_parallel_hash;
 extern PGDLLIMPORT bool enable_partition_pruning;
 extern PGDLLIMPORT bool enable_presorted_aggregate;
 extern PGDLLIMPORT bool enable_async_append;
+extern PGDLLIMPORT bool enable_brinsort;
 extern PGDLLIMPORT int constraint_exclusion;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
@@ -80,6 +81,8 @@ extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 							ParamPathInfo *param_info);
 extern void cost_index(IndexPath *path, PlannerInfo *root,
 					   double loop_count, bool partial_path);
+extern void cost_brinsort(BrinSortPath *path, PlannerInfo *root,
+						  double loop_count, bool partial_path);
 extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 								  ParamPathInfo *param_info,
 								  Path *bitmapqual, double loop_count);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 69be701b167..03ecc988001 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -49,6 +49,15 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									Relids required_outer,
 									double loop_count,
 									bool partial_path);
+extern BrinSortPath *create_brinsort_path(PlannerInfo *root,
+									IndexOptInfo *index,
+									List *indexclauses,
+									List *pathkeys,
+									ScanDirection indexscandir,
+									bool indexonly,
+									Relids required_outer,
+									double loop_count,
+									bool partial_path);
 extern BitmapHeapPath *create_bitmap_heap_path(PlannerInfo *root,
 											   RelOptInfo *rel,
 											   Path *bitmapqual,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 736d78ea4c3..3e1c9457629 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -212,6 +212,9 @@ extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,
 extern Path *get_cheapest_parallel_safe_total_inner(List *paths);
 extern List *build_index_pathkeys(PlannerInfo *root, IndexOptInfo *index,
 								  ScanDirection scandir);
+extern List *build_index_pathkeys_brin(PlannerInfo *root, IndexOptInfo *index,
+								  TargetEntry *tle, int idx,
+								  bool reverse_sort, bool nulls_first);
 extern List *build_partition_pathkeys(PlannerInfo *root, RelOptInfo *partrel,
 									  ScanDirection scandir, bool *partialkeys);
 extern List *build_expression_pathkey(PlannerInfo *root, Expr *expr,
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 12578e42bc3..45413dac1a5 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -367,6 +367,7 @@ extern void tuplesort_reset(Tuplesortstate *state);
 
 extern void tuplesort_get_stats(Tuplesortstate *state,
 								TuplesortInstrumentation *stats);
+extern void tuplesort_reset_stats(Tuplesortstate *state);
 extern const char *tuplesort_method_name(TuplesortMethod m);
 extern const char *tuplesort_space_type_name(TuplesortSpaceType t);
 
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index b7fda6fc828..308e912c21c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
 --------------------------------+---------
  enable_async_append            | on
  enable_bitmapscan              | on
+ enable_brinsort                | on
  enable_gathermerge             | on
  enable_hashagg                 | on
  enable_hashjoin                | on
@@ -133,7 +134,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(22 rows)
+(23 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.39.1

0005-wip-brinsort-explain-stats-20230216.patchtext/x-patch; charset=UTF-8; name=0005-wip-brinsort-explain-stats-20230216.patchDownload

From 082e6d208d8cbaaa5afde94141bc23eeac461d96 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 22:01:01 +0200
Subject: [PATCH 05/10] wip: brinsort explain stats

Show some internal stats about BRIN Sort in EXPLAIN output.
---
 src/backend/commands/explain.c      | 132 ++++++++++++++++++++++++++++
 src/backend/executor/nodeBrinSort.c |  66 +++++++++++---
 src/include/nodes/execnodes.h       |  39 ++++++++
 3 files changed, 223 insertions(+), 14 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 153e41b856f..91dcf7d6660 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -87,6 +87,8 @@ static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
 static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
 							   ExplainState *es);
+static void show_brinsort_stats(BrinSortState *sortstate, List *ancestors,
+								ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1814,6 +1816,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			show_brinsort_stats(castNode(BrinSortState, planstate), ancestors, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
@@ -2432,6 +2435,135 @@ show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 						 ancestors, es);
 }
 
+static void
+show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSortStats  *stats = &sortstate->bs_stats;
+
+	if (sortstate->bs_scan != NULL &&
+		sortstate->bs_scan->ranges != NULL)
+	{
+		TuplesortInstrumentation stats;
+
+		memset(&stats, 0, sizeof(TuplesortInstrumentation));
+		tuplesort_get_stats(sortstate->bs_scan->ranges, &stats);
+
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Ranges: " INT64_FORMAT "  Build time: " INT64_FORMAT "  Method: %s  Space: %ld kB (%s)\n",
+						 sortstate->bs_scan->nranges,
+						 sortstate->bs_stats.ranges_build_ms,
+						 tuplesort_method_name(stats.sortMethod),
+						 stats.spaceUsed,
+						 tuplesort_space_type_name(stats.spaceType));
+	}
+
+	if (stats->sort_count > 0)
+	{
+		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
+							   stats->range_count, es);
+
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainPropertyInteger("Sorts", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Tuples Sorted: " INT64_FORMAT "  Per-sort: " INT64_FORMAT  "  Direct: " INT64_FORMAT "  Spilled: " INT64_FORMAT "  Respilled: " INT64_FORMAT "\n",
+							 stats->ntuples_tuplesort_all,
+							 stats->ntuples_tuplesort_all / stats->sort_count,
+							 stats->ntuples_tuplesort_direct,
+							 stats->ntuples_spilled,
+							 stats->ntuples_respilled);
+		}
+		else
+		{
+			ExplainOpenGroup("Sorts", "Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainPropertyInteger("Tuples per sort", NULL, (int64)
+								   stats->ntuples_tuplesort_all / stats->sort_count, es);
+
+			ExplainPropertyInteger("Sorted tuples (all)", NULL, (int64)
+								   stats->ntuples_tuplesort_all, es);
+
+			ExplainPropertyInteger("Sorted tuples (direct)", NULL, (int64)
+								   stats->ntuples_tuplesort_direct, es);
+
+			ExplainPropertyInteger("Spilled tuples", NULL, (int64)
+								   stats->ntuples_spilled, es);
+
+			ExplainPropertyInteger("Respilled tuples", NULL, (int64)
+								   stats->ntuples_respilled, es);
+
+			ExplainCloseGroup("Sorts", "Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_in_memory > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (in-memory)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_in_memory,
+							 stats->total_space_used_in_memory,
+							 stats->max_space_used_in_memory,
+							 stats->total_space_used_in_memory / stats->sort_count_in_memory);
+		}
+		else
+		{
+			ExplainOpenGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_in_memory / stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_in_memory, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_in_memory, es);
+
+			ExplainCloseGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_on_disk > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (on-disk)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_on_disk,
+							 stats->total_space_used_on_disk,
+							 stats->max_space_used_on_disk,
+							 stats->total_space_used_on_disk / stats->sort_count_on_disk);
+		}
+		else
+		{
+			ExplainOpenGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_on_disk / stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_on_disk, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_on_disk, es);
+
+			ExplainCloseGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+		}
+	}
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 5225e647569..324b558ac6f 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -450,6 +450,8 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
 		return;
 
+	node->bs_stats.range_count++;
+
 	brinsort_start_tidscan(node);
 
 	scan = node->ss.ss_currentScanDesc;
@@ -534,7 +536,10 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 				/* Stash it to the tuplestore (when NULL, or ignore
 				 * it (when not-NULL). */
 				if (isnull)
+				{
 					tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+					node->bs_stats.ntuples_spilled++;
+				}
 
 				/* NULL or not, we're done */
 				continue;
@@ -554,7 +559,12 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 										  &node->bs_sortsupport);
 
 			if (cmp <= 0)
+			{
 				tuplesort_puttupleslot(node->bs_tuplesortstate, tmpslot);
+				node->bs_stats.ntuples_tuplesort_direct++;
+				node->bs_stats.ntuples_tuplesort_all++;
+				node->bs_stats.ntuples_tuplesort++;
+			}
 			else
 			{
 				/*
@@ -565,6 +575,7 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 				 * respill) and stop spilling.
 				 */
 				tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+				node->bs_stats.ntuples_spilled++;
 			}
 		}
 
@@ -633,7 +644,11 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 									  &node->bs_sortsupport);
 
 		if (cmp <= 0)
+		{
 			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+			node->bs_stats.ntuples_tuplesort_all++;
+			node->bs_stats.ntuples_tuplesort++;
+		}
 		else
 		{
 			/*
@@ -644,6 +659,7 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 			 * respill) and stop spilling.
 			 */
 			tuplestore_puttupleslot(tupstore, slot);
+			node->bs_stats.ntuples_respilled++;
 		}
 	}
 
@@ -933,23 +949,40 @@ IndexNext(BrinSortState *node)
 					 */
 					if (node->bs_tuplesortstate)
 					{
-#ifdef DEBUG_BRIN_SORT
+						TuplesortInstrumentation stats;
+
+						/*
+						 * Reset tuplesort statistics between runs, otherwise
+						 * we'll keep re-using stats from the largest run.
+						 */
 						tuplesort_reset_stats(node->bs_tuplesortstate);
-#endif
 
 						tuplesort_performsort(node->bs_tuplesortstate);
 
-#ifdef DEBUG_BRIN_SORT
-						if (debug_brin_sort)
-						{
-							TuplesortInstrumentation stats;
+						node->bs_stats.sort_count++;
+						node->bs_stats.ntuples_tuplesort = 0;
 
-							memset(&stats, 0, sizeof(TuplesortInstrumentation));
-							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
 
-							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+						if (stats.spaceType == SORT_SPACE_TYPE_DISK)
+						{
+							node->bs_stats.sort_count_on_disk++;
+							node->bs_stats.total_space_used_on_disk += stats.spaceUsed;
+							node->bs_stats.max_space_used_on_disk = Max(node->bs_stats.max_space_used_on_disk,
+																		stats.spaceUsed);
+						}
+						else if (stats.spaceType == SORT_SPACE_TYPE_MEMORY)
+						{
+							node->bs_stats.sort_count_in_memory++;
+							node->bs_stats.total_space_used_in_memory += stats.spaceUsed;
+							node->bs_stats.max_space_used_in_memory = Max(node->bs_stats.max_space_used_in_memory,
+																		  stats.spaceUsed);
+						}
 
-							elog(WARNING, "method: %s  space: %ld kB (%s)",
+#ifdef DEBUG_BRIN_SORT
+						if (debug_brin_sort)
+						{
+							elog(WARNING, "method: %s  space: " INT64_FORMAT " kB (%s)",
 								 tuplesort_method_name(stats.sortMethod),
 								 stats.spaceUsed,
 								 tuplesort_space_type_name(stats.spaceType));
@@ -1219,9 +1252,10 @@ ExecEndBrinSort(BrinSortState *node)
 		tuplesort_end(node->bs_tuplesortstate);
 	node->bs_tuplesortstate = NULL;
 
-	if (node->bs_scan->ranges != NULL)
+	if (node->bs_scan != NULL &&
+		node->bs_scan->ranges != NULL)
 		tuplesort_end(node->bs_scan->ranges);
-	node->bs_scan->ranges = NULL;
+	node->bs_scan = NULL;
 }
 
 /* ----------------------------------------------------------------
@@ -1314,6 +1348,7 @@ ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
 	TargetEntry *tle;
 	int			j;
 	List	   *indexprs = RelationGetIndexExpressions(indexRel);
+	TimestampTz	start_ts;
 
 	/* BRIN Sort only allows ORDER BY using a single column */
 	Assert(node->numCols == 1);
@@ -1404,15 +1439,18 @@ ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
 
 	/*
 	 * Ask the opclass to produce ranges in appropriate ordering.
-	 *
-	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
 	 */
+	start_ts = GetCurrentTimestamp();
+
 	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
 											node->collations[0],
 											PointerGetDatum(scan),
 											Int16GetDatum(attno),
 											BoolGetDatum(asc)));
 
+	planstate->bs_stats.ranges_build_ms
+		= TimestampDifferenceMilliseconds(start_ts, GetCurrentTimestamp());
+
 	/* allocate for space, and also for the alternative ordering */
 	planstate->bs_scan = brscan;
 }
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index efe26938d0a..2a98286e11a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1628,6 +1628,44 @@ typedef struct BrinRanges
 	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
 } BrinRanges;
 
+typedef struct BrinSortStats
+{
+	/* number of sorts */
+	int64	sort_count;
+
+	/* number of ranges loaded */
+	int64	range_count;
+
+	/* tuples in the current tuplesort */
+	int64	ntuples_tuplesort;
+
+	/* tuples written directly to tuplesort */
+	int64	ntuples_tuplesort_direct;
+
+	/* tuples written to tuplesort (all) */
+	int64	ntuples_tuplesort_all;
+
+	/* tuples written to tuplestore */
+	int64	ntuples_spilled;
+
+	/* tuples copied from old to new tuplestore */
+	int64	ntuples_respilled;
+
+	/* number of in-memory/on-disk sorts */
+	int64	sort_count_in_memory;
+	int64	sort_count_on_disk;
+
+	/* total/maximum amount of space used by either sort */
+	int64	total_space_used_in_memory;
+	int64	total_space_used_on_disk;
+	int64	max_space_used_in_memory;
+	int64	max_space_used_on_disk;
+
+	/* time to build ranges (milliseconds) */
+	int64	ranges_build_ms;
+
+} BrinSortStats;
+
 typedef struct BrinSortState
 {
 	ScanState	ss;				/* its first field is NodeTag */
@@ -1664,6 +1702,7 @@ typedef struct BrinSortState
 	BrinSortPhase	bs_phase;
 	SortSupportData	bs_sortsupport;
 	ProjectionInfo *bs_ProjInfo;
+	BrinSortStats	bs_stats;
 
 	/*
 	 * We need two tuplesort instances - one for current range, one for
-- 
2.39.1

0006-wip-multiple-watermark-steps-20230216.patchtext/x-patch; charset=UTF-8; name=0006-wip-multiple-watermark-steps-20230216.patchDownload

From ecee0ec22432041baa830f4d98b4c5fc2d48f1f6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 20 Oct 2022 13:03:00 +0200
Subject: [PATCH 06/10] wip: multiple watermark steps

Allow incrementing the minval watermark faster, by skipping some minval
values. This allows sorting more data at once (instead of many tiny
sorts, which is inefficient). This also reduces the number of rows we
need to spill (and possibly transfer multiple times).

To use a different watermark step, use a new GUC:

  SET brinsort_watermark_step = 16
---
 src/backend/commands/explain.c      |  3 ++
 src/backend/executor/nodeBrinSort.c | 59 ++++++++++++++++++++++++++---
 src/backend/utils/misc/guc_tables.c | 12 ++++++
 3 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 91dcf7d6660..d84b118ac73 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -47,6 +47,7 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
 /* Hook for plugins to get control in explain_get_index_name() */
 explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
 
+extern int brinsort_watermark_step;
 
 /* OR-able flags for ExplainXMLTag() */
 #define X_OPENING 0
@@ -2457,6 +2458,8 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 						 tuplesort_space_type_name(stats.spaceType));
 	}
 
+	ExplainPropertyInteger("Step", NULL, (int64) brinsort_watermark_step, es);
+
 	if (stats->sort_count > 0)
 	{
 		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 324b558ac6f..e96a27b3ea1 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -248,6 +248,14 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 1;
+
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -351,11 +359,24 @@ brinsort_end_tidscan(BrinSortState *node)
  * a separate "first" parameter - "set=false" has the same meaning.
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool asc)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
 {
 	int		cmp;
+
+	/* assume we haven't found a watermark */
 	bool	found = false;
 
+	Assert(steps > 0);
+
+	/*
+	 * If the watermark is empty, either this is the first call (in
+	 * which case we just use the first (or rather second) value.
+	 * Otherwise it means we've reached the end, so no point in looking
+	 * for more watermarks.
+	 */
+	if (node->bs_watermark_empty && !first)
+		return;
+
 	tuplesort_markpos(node->bs_scan->ranges);
 
 	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
@@ -381,22 +402,48 @@ brinsort_update_watermark(BrinSortState *node, bool asc)
 		else
 			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
 
+		/*
+		 * Has to be the first call (otherwise we would not get here, because we
+		 * terminate after bs_watermark_set gets flipped back to false), so we
+		 * just set the value. But we don't count this as a step, because that
+		 * just picks the first minval value, as we certainly need to do at least
+		 * one more step.
+		 *
+		 * XXX Actually, do we need to make another step? Maybe there are enough
+		 * not-summarized ranges? Although, we don't know what values are in
+		 * those, ranges, and with increasing data we might easily end up just
+		 * writing all of it into the spill tuplestore. So making one more step
+		 * seems like a better idea - we'll at lest be able to produce something
+		 * which is good for LIMIT queries.
+		 */
 		if (!node->bs_watermark_set)
 		{
+			Assert(first);
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
+			found = true;
 			continue;
 		}
 
 		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
 								  &node->bs_sortsupport);
 
-		if (cmp < 0)
+		/*
+		 * Values should not decrease (or whatever the operator says, might
+		 * be a DESC sort).
+		 */
+		Assert(cmp <= 0);
+
+		if (cmp < 0)	/* new watermark value */
 		{
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
 			found = true;
-			break;
+
+			steps--;
+
+			if (steps == 0)
+				break;
 		}
 	}
 
@@ -913,7 +960,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -1034,7 +1081,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -1059,7 +1106,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, asc);
+								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 775d220fce9..c36b0175344 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -95,6 +95,7 @@ extern char *temp_tablespaces;
 extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
 extern bool synchronize_seqscans;
+extern int	brinsort_watermark_step;
 
 #ifdef DEBUG_BRIN_STATS
 extern bool debug_brin_stats;
@@ -3533,6 +3534,17 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"brinsort_watermark_step", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("sets the step for brinsort watermark increments"),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&brinsort_watermark_step,
+		1, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
-- 
2.39.1

0007-wip-adjust-watermark-step-20230216.patchtext/x-patch; charset=UTF-8; name=0007-wip-adjust-watermark-step-20230216.patchDownload

From cda8cc00c4d58e7a89ac86a79b2bcdc144df0b06 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 00:06:28 +0200
Subject: [PATCH 07/10] wip: adjust watermark step

Look at available statistics - number of possible watermark values,
number of rows, work_mem, etc. and pick a good watermark_step value.

To calculate step using statistics, set the GUC to 0:

   SET brinsort_watermark_step = 0;
---
 src/backend/commands/explain.c          |  6 +++
 src/backend/executor/nodeBrinSort.c     | 21 ++++----
 src/backend/optimizer/plan/createplan.c | 70 +++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c     |  2 +-
 src/include/nodes/execnodes.h           |  5 ++
 src/include/nodes/plannodes.h           |  3 ++
 6 files changed, 94 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d84b118ac73..7ed2c95dd09 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2440,6 +2440,7 @@ static void
 show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 {
 	BrinSortStats  *stats = &sortstate->bs_stats;
+	BrinSort   *plan = (BrinSort *) sortstate->ss.ps.plan;
 
 	if (sortstate->bs_scan != NULL &&
 		sortstate->bs_scan->ranges != NULL)
@@ -2462,6 +2463,9 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 
 	if (stats->sort_count > 0)
 	{
+		ExplainPropertyInteger("Average Step", NULL, (int64)
+							   stats->watermark_updates_steps / stats->watermark_updates_count, es);
+
 		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
 							   stats->range_count, es);
 
@@ -2503,6 +2507,8 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 			ExplainCloseGroup("Sorts", "Sorts", true, es);
 		}
 	}
+	else
+		ExplainPropertyInteger("Initial Step", NULL, (int64) plan->watermark_step, es);
 
 	if (stats->sort_count_in_memory > 0)
 	{
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index e96a27b3ea1..0ea801a728d 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -248,14 +248,6 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
-/*
- * How many distinct minval values to look forward for the next watermark?
- *
- * The smallest step we can do is 1, which means the immediately following
- * (while distinct) minval.
- */
-int brinsort_watermark_step = 1;
-
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -359,9 +351,11 @@ brinsort_end_tidscan(BrinSortState *node)
  * a separate "first" parameter - "set=false" has the same meaning.
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc)
 {
 	int		cmp;
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	int			steps = plan->watermark_step;
 
 	/* assume we haven't found a watermark */
 	bool	found = false;
@@ -449,6 +443,9 @@ brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
 
 	tuplesort_restorepos(node->bs_scan->ranges);
 
+	node->bs_stats.watermark_updates_count++;
+	node->bs_stats.watermark_updates_steps += plan->watermark_step;
+
 	node->bs_watermark_empty = (!found);
 }
 
@@ -960,7 +957,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, true, asc);
 				}
 
 				break;
@@ -1081,7 +1078,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, false, asc);
 				}
 
 				break;
@@ -1106,7 +1103,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+								brinsort_update_watermark(node, true, asc);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 78837928d9a..ba96f1ce04c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -18,6 +18,7 @@
 
 #include <math.h>
 
+#include "access/brin.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
 #include "foreign/fdwapi.h"
@@ -321,6 +322,14 @@ static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
 											 GatherMergePath *best_path);
 
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 0;
+
 /*
  * create_plan
  *	  Creates the access plan for a query by recursively processing the
@@ -3345,6 +3354,67 @@ create_brinsort_plan(PlannerInfo *root,
 
 	copy_generic_path_info(&brinsort_plan->scan.plan, &best_path->ipath.path);
 
+	/*
+	 * determine watermark step (how fast to advance)
+	 *
+	 * If the brinsort_watermark_step is set to a non-zero value, we just use
+	 * that value directly. Otherwise we pick a value using some simple
+	 * heuristics heuristics - we don't want the rows to exceed work_mem, and
+	 * we leave a bit slack (because we're adding batches of rows, not row
+	 * by row).
+	 *
+	 * This has a weakness, because it assumes we incrementally add the same
+	 * number of rows into the "sort" set - but imagine very wide overlapping
+	 * ranges (e.g. random data on the same domain). Most of them will have
+	 * about the same minval, so the sort grows only very slowly. Until the
+	 * very last range, that removes the watermark and only then do most of
+	 * the rows get to the tuplesort.
+	 *
+	 * XXX But maybe we can look at the other statistics we have, like number
+	 * of overlaps and average range selectivity (% of tuples matching), and
+	 * deduce something from that?
+	 *
+	 * XXX Could we maybe adjust the watermark step adaptively at runtime?
+	 * That is, when we get to the "sort" step, maybe check how many rows
+	 * are there, and if there are only few then try increasing the step?
+	 */
+	brinsort_plan->watermark_step = brinsort_watermark_step;
+
+	if (brinsort_plan->watermark_step == 0)
+	{
+		BrinMinmaxStats *amstats;
+
+		/**/
+		Cardinality		rows = brinsort_plan->scan.plan.plan_rows;
+
+		/* estimate rowsize in the tuplesort */
+		int				width = brinsort_plan->scan.plan.plan_width;
+		int				tupwidth = (MAXALIGN(width) + MAXALIGN(SizeofHeapTupleHeader));
+
+		/* Don't overflow work_mem (use only half to absorb variations. */
+		int				maxrows = (work_mem * 1024L / tupwidth / 2);
+
+		/* If this is a LIMIT query, aim only for the required number of rows. */
+		if (root->limit_tuples > 0)
+			maxrows = Min(maxrows, root->limit_tuples);
+
+		/* FIXME hard-coded attnum */
+		amstats = (BrinMinmaxStats *) get_attindexam(brinsort_plan->indexid, 1);
+
+		if (amstats)
+		{
+			double	pct_per_step = Max(amstats->minval_increment_avg,
+									   amstats->maxval_increment_avg);
+			double	rows_per_step = Max(1.0, pct_per_step * rows);
+
+			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
+		}
+
+		/* some rough safety estimates */
+		brinsort_plan->watermark_step = Max(brinsort_plan->watermark_step, 1);
+		brinsort_plan->watermark_step = Min(brinsort_plan->watermark_step, 8192);
+	}
+
 	return brinsort_plan;
 }
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c36b0175344..1bc39b37606 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3541,7 +3541,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_NOT_IN_SAMPLE
 		},
 		&brinsort_watermark_step,
-		1, 1, INT_MAX,
+		0, 0, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 2a98286e11a..06dc6416d99 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1664,6 +1664,10 @@ typedef struct BrinSortStats
 	/* time to build ranges (milliseconds) */
 	int64	ranges_build_ms;
 
+	/* number/sum of watermark update steps */
+	int64	watermark_updates_steps;
+	int64	watermark_updates_count;
+
 } BrinSortStats;
 
 typedef struct BrinSortState
@@ -1696,6 +1700,7 @@ typedef struct BrinSortState
 	BrinRangeScanDesc *bs_scan;
 	BrinRange	   *bs_range;
 	ExprState	   *bs_qual;
+	int				bs_watermark_step;
 	Datum			bs_watermark;
 	bool			bs_watermark_set;
 	bool			bs_watermark_empty;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 13dba2b10b8..e5a91b7ba43 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -525,6 +525,9 @@ typedef struct BrinSort
 	/* NULLS FIRST/LAST directions */
 	bool	   *nullsFirst pg_node_attr(array_size(numCols));
 
+	/* number of watermark steps to make */
+	int			watermark_step;
+
 } BrinSort;
 
 /* ----------------
-- 
2.39.1

0008-wip-adaptive-watermark-step-20230216.patchtext/x-patch; charset=UTF-8; name=0008-wip-adaptive-watermark-step-20230216.patchDownload

From 6a4ce68923fdd16ebcee788141ad3ac75ca1971e Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 01:39:39 +0200
Subject: [PATCH 08/10] wip: adaptive watermark step

Another option it to adjust the watermark step based on past tuplesort
executions, and either increase or decrease the step, based on whether
the sort was in-memory or on-disk, etc.

To do this, set the GUC to -1:

  SET brinsort_watermark_step = -1;
---
 src/backend/access/brin/brin_minmax.c   |   7 +-
 src/backend/executor/nodeBrinSort.c     | 189 +++++++++++++++++++++++-
 src/backend/optimizer/plan/createplan.c |  21 +--
 src/backend/utils/misc/guc_tables.c     |   2 +-
 src/include/nodes/execnodes.h           |   2 +-
 src/include/nodes/plannodes.h           |   3 +-
 6 files changed, 206 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 987379c911f..d8dd6ddc9e5 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -46,9 +46,6 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
 
-/* print info about ranges */
-#define BRINSORT_DEBUG
-
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1916,7 +1913,7 @@ brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan, TupleTableSlot *slot,
 	scan->nranges++;
 }
 
-#ifdef BRINSORT_DEBUG
+#ifdef BRIN_SORT_DEBUG
 /*
  * brin_minmax_scan_next
  *		Return the next BRIN range information from the tuplestore.
@@ -2125,7 +2122,7 @@ brin_minmax_ranges(PG_FUNCTION_ARGS)
 	/* do the sort and any necessary post-processing */
 	brin_minmax_scan_finalize(brscan);
 
-#ifdef BRINSORT_DEBUG
+#ifdef BRIN_SORT_DEBUG
 	brin_minmax_scan_dump(brscan);
 #endif
 
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 0ea801a728d..8f1084ef6eb 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -218,6 +218,8 @@
  *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
  *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
  */
+#include <math.h>
+
 #include "postgres.h"
 
 #include "access/brin.h"
@@ -248,6 +250,14 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int	brinsort_watermark_step = 0;
+
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -859,6 +869,175 @@ brinsort_rescan(BrinSortState *node)
 	tuplesort_rescan(node->bs_scan->ranges);
 }
 
+/*
+ * Look at the tuplesort statistics, and maybe increase or decrease the
+ * watermark step. If the last sort was in-memory, we decrease the step.
+ * If the sort was in-memory, but we used less than work_mem/3, increment
+ * the step value.
+ *
+ * XXX This should probably behave differently for LIMIT queries, so that
+ * we don't load too many rows unnecessarily. We already consider that in
+ * create_brinsort_plan, but maybe we should limit increments to the ste
+ * value here too - say, by tracking how many rows are we supposed to
+ * produce, and limiting the watermark so that we don't process too many
+ * rows in future steps.
+ *
+ * XXX We might also track the number of rows in the sort and space used,
+ * to calculate more accurate estimate of row width. And then use that to
+ * calculate number of rows that fit into work_mem. But the number of rows
+ * that go into tuplesort (per range) added would still remain fairly
+ * inaccurate, so not sure how good this woud be.
+ */
+static void
+brinsort_adjust_watermark_step(BrinSortState *node, TuplesortInstrumentation *stats)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+
+	if (brinsort_watermark_step != -1)
+		return;
+
+	if (stats->spaceType == SORT_SPACE_TYPE_DISK)
+	{
+		/*
+		 * We don't know how much to decrease the step (hard to estimate
+		 * due to space needed for in-memory and on-disk sorts is not
+		 * easily comparable, so we just cut the step in half. For the
+		 * in-memory sort, we then can do better estimate and increase
+		 * the step more accurately.
+		 */
+		plan->watermark_step = Max(1, plan->watermark_step / 2);
+	}
+	else
+	{
+		/*
+		 * Adjust the step based on the last sort - we shoot for 2/3 of
+		 * work_mem, to keep some slack (and not switch to on-disk sort
+		 * due to minor differences). We calculate the average row width
+		 * using space used and number of rows in the tuplesort, number
+		 * of rows we could fit into work_mem, and how many steps would
+		 * that mean (assuming number of rows is proportional to the
+		 * number of steps).
+		 *
+		 * We need to be careful about the number of rows we're supposed
+		 * to produce (and how many we already produced). Consider for
+		 * example a query with LIMIT 1000, and that we produce 999 rows
+		 * in the first sort, so that we need only 1 more row. It would
+		 * be silly to pick the steps with the goal to "fill work_mem"
+		 * instead of just enough to produce the one row.
+		 *
+		 * XXX In principle, we don't know how many rows will need to be
+		 * read from the table - there may be interesting rows already in
+		 * the tuplestore (in which case we could do a smaller step). But
+		 * we don't know how many such rows are there - maybe if we had
+		 * multiple smaller tuplestores, which would also reduce the
+		 * amount of "respill" we need to do.
+		 */
+		int		nrows_remaining;
+		int		step = plan->watermark_step;
+		int		step_max = plan->watermark_step * 2;
+
+		/* number of remaining rows we're expected to produce */
+		nrows_remaining = Max(1.0, plan->step_maxrows - node->bs_stats.ntuples_tuplesort_all);
+
+		/*
+		 * If we sorted any rows, calculate how many similar rows we can fit
+		 * into work_mem. We restrict ourselves to 2/3 of work_mem, to leave
+		 * a bit of slack space.
+		 *
+		 * XXX Hopefully the average width is somewhat accurate, but maybe
+		 * we should remember the width we originally expected, and combine
+		 * that somehow. Maybe we should not use just the last tuplesort,
+		 * but instead accumulate average from all preceding sorts and
+		 * combine them somehow (say, using weighted average with older
+		 * values having less influence).
+		 */
+		if (node->bs_stats.ntuples_tuplesort > 0)
+		{
+			int		nrows_wmem;
+			int		avgwidth;
+
+			/* average tuple width, calculated from last sort */
+			avgwidth = (stats->spaceUsed * 1024L / node->bs_stats.ntuples_tuplesort);
+
+			/*
+			 * Calculate the numer of rows to fit into 2/3 of work_mem, but
+			 * cap to the number of rows we're expected to produce.
+			 */
+			nrows_wmem = Min(nrows_remaining, (2 * 1024L * work_mem / 3) / avgwidth);
+
+			/* scale the number of steps to produce the number of rows */
+			step = step * ((double) (nrows_wmem * avgwidth) / (stats->spaceUsed * 1024L));
+
+			/* remember this as the max, so that we don't overflow work_mem */
+			step_max = Min(step, step_max);
+
+			/* however, make sure we don't grow too fast - cap to 2x */
+			step = Min(step, step_max);
+		}
+
+		/*
+		 * Now calculate average step size using data from all sorts we did
+		 * up to now. Then we calculate the number of steps we expect to be
+		 * necessary.
+		 *
+		 * If we had calculated average number of rows per step from AM stats,
+		 * consider that too. It's possible the batch had just one row, which
+		 * might result in very high estimate of steps - it'd be silly to
+		 * jump e.g. from 1 to 1000 based on this unreliable statistics. To
+		 * prevent that, we combine the two rows_per_step sources as weighted
+		 * sum, using the observed vs. target number of rows as weight. The
+		 * closer we're to the target, the more reliable value from past
+		 * executions is.
+		 *
+		 * But we don't want to overflow work_mem, so cap by step_max.
+		 */
+		if (node->bs_stats.ntuples_tuplesort_all > 0)
+		{
+			double		rows_per_step;
+
+			/* average number of rows we produced per step so far */
+			rows_per_step = (double) node->bs_stats.ntuples_tuplesort_all / node->bs_stats.watermark_updates_steps;
+
+			/*
+			 * If we have AM stats with average number of rows per step, consider
+			 * that too - approximate depending on what fraction of rows we already
+			 * produced (with higher fraction of rows produced we prefer the local
+			 * average, as opposed to the global average from index AM stats).
+			 */
+			if (plan->rows_per_step > 0)
+			{
+				/* number of rows we already produced (as a fraction) */
+				double weight = (double) node->bs_stats.ntuples_tuplesort_all / plan->step_maxrows;
+
+				/* paranoia */
+				weight = Min(1.0, weight);
+
+				/*
+				 * Approximate between index AM and "local" average calculated
+				 * from past executions. The closer we get to target rows, the
+				 * more we ignore the index AM stats.
+				 */
+				rows_per_step = weight * rows_per_step + (1 - weight) * plan->rows_per_step;
+			}
+
+			/* approximate the steps between */
+			step = Max(step, ceil((double) nrows_remaining / rows_per_step));
+
+			/*
+			 * But don't overflow the current max (which is set either
+			 * as 2x starting value, or from work_mem.
+			 */
+			step = Min(step, step_max);
+		}
+
+		plan->watermark_step = step;
+
+	}
+
+	plan->watermark_step = Max(1, plan->watermark_step);
+	plan->watermark_step = Min(8192, plan->watermark_step);
+}
+
 /* ----------------------------------------------------------------
  *		IndexNext
  *
@@ -997,13 +1176,21 @@ IndexNext(BrinSortState *node)
 
 						/*
 						 * Reset tuplesort statistics between runs, otherwise
-						 * we'll keep re-using stats from the largest run.
+						 * we'll keep re-using stats from the largest run, which
+						 * would then confuse the adaptive adjustment of the
+						 * watermark step.
 						 */
 						tuplesort_reset_stats(node->bs_tuplesortstate);
 
 						tuplesort_performsort(node->bs_tuplesortstate);
 
 						node->bs_stats.sort_count++;
+
+						memset(&stats, 0, sizeof(TuplesortInstrumentation));
+						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+						brinsort_adjust_watermark_step(node, &stats);
+
 						node->bs_stats.ntuples_tuplesort = 0;
 
 						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ba96f1ce04c..8d7114a2e80 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -322,13 +322,8 @@ static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
 											 GatherMergePath *best_path);
 
 
-/*
- * How many distinct minval values to look forward for the next watermark?
- *
- * The smallest step we can do is 1, which means the immediately following
- * (while distinct) minval.
- */
-int brinsort_watermark_step = 0;
+/* defined in nodeBrinSort.c */
+extern int brinsort_watermark_step;
 
 /*
  * create_plan
@@ -3379,8 +3374,14 @@ create_brinsort_plan(PlannerInfo *root,
 	 * are there, and if there are only few then try increasing the step?
 	 */
 	brinsort_plan->watermark_step = brinsort_watermark_step;
+	brinsort_plan->rows_per_step = -1;
 
-	if (brinsort_plan->watermark_step == 0)
+	if (root->limit_tuples > 0)
+		brinsort_plan->step_maxrows = root->limit_tuples;
+	else
+		brinsort_plan->step_maxrows = brinsort_plan->scan.plan.plan_rows;
+
+	if (brinsort_plan->watermark_step <= 0)
 	{
 		BrinMinmaxStats *amstats;
 
@@ -3407,7 +3408,9 @@ create_brinsort_plan(PlannerInfo *root,
 									   amstats->maxval_increment_avg);
 			double	rows_per_step = Max(1.0, pct_per_step * rows);
 
-			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
+			brinsort_plan->rows_per_step = rows_per_step;
+
+			brinsort_plan->watermark_step = (int) ceil(maxrows / rows_per_step);
 		}
 
 		/* some rough safety estimates */
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1bc39b37606..120250867d4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3541,7 +3541,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_NOT_IN_SAMPLE
 		},
 		&brinsort_watermark_step,
-		0, 0, INT_MAX,
+		0, -1, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 06dc6416d99..a3059314054 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1713,7 +1713,7 @@ typedef struct BrinSortState
 	 * We need two tuplesort instances - one for current range, one for
 	 * spill-over tuples from the overlapping ranges
 	 */
-	void		   *bs_tuplesortstate;
+	Tuplesortstate  *bs_tuplesortstate;
 	Tuplestorestate *bs_tuplestore;
 } BrinSortState;
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index e5a91b7ba43..05a9d1dd7d4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -527,7 +527,8 @@ typedef struct BrinSort
 
 	/* number of watermark steps to make */
 	int			watermark_step;
-
+	int			step_maxrows;
+	int			rows_per_step;
 } BrinSort;
 
 /* ----------------
-- 
2.39.1

0009-wip-add-brin_sort.sql-test-20230216.patchtext/x-patch; charset=UTF-8; name=0009-wip-add-brin_sort.sql-test-20230216.patchDownload

From ebf940432d88e24e63d9f28c4488c407ef88227a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 3 Feb 2023 13:19:24 +0100
Subject: [PATCH 09/10] wip: add brin_sort.sql test

---
 src/test/regress/expected/brin_sort.out       | 543 ++++++++++++
 src/test/regress/expected/brin_sort_exprs.out | 788 ++++++++++++++++++
 src/test/regress/expected/brin_sort_multi.out | 545 ++++++++++++
 .../expected/brin_sort_multi_exprs.out        | 784 +++++++++++++++++
 src/test/regress/parallel_schedule            |   6 +
 src/test/regress/sql/brin_sort.sql            | 238 ++++++
 src/test/regress/sql/brin_sort_exprs.sql      | 373 +++++++++
 src/test/regress/sql/brin_sort_multi.sql      | 235 ++++++
 .../regress/sql/brin_sort_multi_exprs.sql     | 369 ++++++++
 9 files changed, 3881 insertions(+)
 create mode 100644 src/test/regress/expected/brin_sort.out
 create mode 100644 src/test/regress/expected/brin_sort_exprs.out
 create mode 100644 src/test/regress/expected/brin_sort_multi.out
 create mode 100644 src/test/regress/expected/brin_sort_multi_exprs.out
 create mode 100644 src/test/regress/sql/brin_sort.sql
 create mode 100644 src/test/regress/sql/brin_sort_exprs.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi_exprs.sql

diff --git a/src/test/regress/expected/brin_sort.out b/src/test/regress/expected/brin_sort.out
new file mode 100644
index 00000000000..0e207cf4f6f
--- /dev/null
+++ b/src/test/regress/expected/brin_sort.out
@@ -0,0 +1,543 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_exprs.out b/src/test/regress/expected/brin_sort_exprs.out
new file mode 100644
index 00000000000..87e58a10883
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_exprs.out
@@ -0,0 +1,788 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi.out b/src/test/regress/expected/brin_sort_multi.out
new file mode 100644
index 00000000000..22fa8331d38
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi.out
@@ -0,0 +1,545 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi_exprs.out b/src/test/regress/expected/brin_sort_multi_exprs.out
new file mode 100644
index 00000000000..0e9f4ea3182
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi_exprs.out
@@ -0,0 +1,784 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 15e015b3d64..39af15bb5d8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -131,3 +131,9 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# try sorting using BRIN index
+test: brin_sort
+test: brin_sort_multi
+test: brin_sort_exprs
+test: brin_sort_multi_exprs
diff --git a/src/test/regress/sql/brin_sort.sql b/src/test/regress/sql/brin_sort.sql
new file mode 100644
index 00000000000..f4458bdc386
--- /dev/null
+++ b/src/test/regress/sql/brin_sort.sql
@@ -0,0 +1,238 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_exprs.sql b/src/test/regress/sql/brin_sort_exprs.sql
new file mode 100644
index 00000000000..de1f1e7d8fe
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_exprs.sql
@@ -0,0 +1,373 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi.sql b/src/test/regress/sql/brin_sort_multi.sql
new file mode 100644
index 00000000000..d0544ad7069
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi.sql
@@ -0,0 +1,235 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi_exprs.sql b/src/test/regress/sql/brin_sort_multi_exprs.sql
new file mode 100644
index 00000000000..299c7979326
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi_exprs.sql
@@ -0,0 +1,369 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
-- 
2.39.1

0010-wip-test-generator-script-20230216.patchtext/x-patch; charset=UTF-8; name=0010-wip-test-generator-script-20230216.patchDownload

From 4714addcc4001d7b59cd9907a0569bb840cd3ac1 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 6 Feb 2023 03:42:52 +0100
Subject: [PATCH 10/10] wip: test generator script

---
 brin-test.py | 386 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 386 insertions(+)
 create mode 100644 brin-test.py

diff --git a/brin-test.py b/brin-test.py
new file mode 100644
index 00000000000..c90ec798bea
--- /dev/null
+++ b/brin-test.py
@@ -0,0 +1,386 @@
+import psycopg2
+import psycopg2.extras
+import random
+import sys
+import time
+import re
+
+from datetime import datetime
+from statistics import mean
+
+cols = [('int_val', 'int4_minmax_ops'),
+		('bigint_val', 'int8_minmax_ops'),
+		('text_val', 'text_minmax_ops'),
+		('inet_val', 'inet_minmax_ops'),
+		('(int_val+1)', 'int4_minmax_ops'),
+		('(bigint_val+1)', 'int8_minmax_ops'),
+		("('x' || text_val)", 'text_minmax_ops'),
+		('(inet_val + 1)', 'inet_minmax_ops'),
+		('(int_val+2)', 'int4_minmax_ops'),
+		('(bigint_val+2)', 'int8_minmax_ops'),
+		("('y' || text_val)", 'text_minmax_ops'),
+		('(inet_val + 2)', 'inet_minmax_ops')]
+
+# randomly reorder the table columns
+#table_cols = [('int_val int', 'i', 'i + %(skew)d * random()', 'i + 1000000 * random()'),
+#			  ('bigint_val bigint', '-i', '-i - 100 * random()', '-1 - 1000000 * random()'),
+#			  ('inet_val inet', "'10.0.0.0'::inet + i", "'10.0.0.0'::inet + i * 100 * random()::int", "'10.0.0.0'::inet + i + 1000000 * random()::int"),
+#			  ('text_val text', "lpad(i::text || md5(i::text), 40, '0')", "lpad((i + 100*random()::int)::text || md5(i::text), 40, '0')", "lpad((i + 1000000*random()::int)::text || md5(i::text), 40, '0')")]
+
+table_cols = [('int_val int', 'i + %(randomness)d * random()'),
+			  ('bigint_val bigint', '-i - %(randomness)d * random()'),
+			  ('inet_val inet', "'10.0.0.0'::inet + i + %(randomness)d * random()::int"),
+			  ('text_val text', "lpad((i + %(randomness)d * random()::int)::text || md5(i::text), 40, '0')")]
+
+
+def execute_query(cur, query, fetch_result = False):
+
+	cur.execute(query)
+
+	if fetch_result:
+		return cur.fetchall()
+
+
+# recreate the table with the columns in randomized order
+def recreate_table(conn, nrows, randomness, fillfactor):
+
+	random.shuffle(table_cols)
+
+	cur = conn.cursor()
+
+	execute_query(cur, 'BEGIN')
+
+	execute_query(cur, 'DROP TABLE IF EXISTS test_table')
+
+	execute_query(cur, 'CREATE TABLE test_table (%s) with (fillfactor=%d)' % (', '.join([v[0] for v in table_cols]), fillfactor))
+	print('CREATE TABLE test_table (%s) with (fillfactor=%d)' % (', '.join([v[0] for v in table_cols]), fillfactor))
+
+	insert_sql = 'INSERT INTO test_table SELECT %s FROM generate_series(1,%d) s(i)' % (', '.join([v[1] for v in table_cols]), nrows)
+	insert_sql = insert_sql % {'randomness' : int(nrows * randomness), 'rows' : nrows}
+
+	print(insert_sql)
+
+	execute_query(cur, insert_sql)
+
+	execute_query(cur, 'COMMIT')
+
+	cur.close()
+
+
+def create_indexes(conn, pages_per_range):
+
+	cur = conn.cursor()
+
+	num_indexes = random.randint(1,len(cols))
+
+	# randomly pick columns to index
+	indexed = random.sample(cols, num_indexes)
+
+	for c in indexed:
+		# f = random.random()
+		# num_pages = 1 + int(f * f * f * 256)
+		index_sql = 'CREATE INDEX ON test_table USING brin (%s %s) WITH (pages_per_range=%d)' % (c[0], c[1], pages_per_range)
+		print(index_sql)
+		execute_query(cur, index_sql)
+
+	cur.close()
+
+	return indexed
+
+
+def brinsort_in_explain(cur, query):
+
+	cur.execute('explain ' + query)
+	for r in cur.fetchall():
+		if 'BRIN Sort' in r['QUERY PLAN']:
+			return True
+
+	return False
+
+
+def compare_default(a, b):
+	if a < b:
+		return -1
+	elif a > b:
+		return 1
+	return 0
+
+
+def compare_inet(a, b):
+	a = [int(v) for v in a.split('.')]
+	b = [int(v) for v in b.split('.')]
+
+	for p in range(0,4):
+		r = compare_default(a[p], b[p])
+		if r != 0:
+			return r
+
+	return r
+
+
+def check_ordering(conn, config, query, expected_rows, select_star, select_list, sort_list, is_desc):
+
+	cur = conn.cursor()
+
+	data = execute_query(cur, query, True)
+
+	if len(data) != expected_rows:
+		print('ERROR: unexpected number of rows %s %s' % (expected_rows, len(data)))
+		sys.exit(1)
+
+	# what prefix we can check ordering for (some sort columns may not be
+	# included in the result, and we need a continuous prefix)
+	prefix = []
+	indexes = []
+	sort_order = {}
+	for s in sort_list:
+		if select_star:
+			if s[0] not in [x for x in table_cols]:
+				break
+
+			idx = [x for x in table_cols].index(s[0])
+		else:
+			if s not in select_list:
+				break
+
+			idx = select_list.index(s)
+
+		if idx is None:
+			break
+
+		sort_idx = sort_list.index(s)
+
+		prefix.append(s)
+		indexes.append(idx)
+		# sort_order.update({select_list.index(s) : is_desc[sort_list.index(s)]})
+		sort_order.update({idx : is_desc[sort_idx]})
+
+	# print("PREFIX", indexes, prefix)
+
+	if len(prefix) != 0:
+		prev = None
+		for row in data:
+			if prev is not None:
+
+				for idx in indexes:
+
+					if select_list[idx][1] == 'inet_minmax_ops':
+						r = compare_inet(prev[idx], row[idx])
+					else:
+						r = compare_default(prev[idx], row[idx])
+
+					if sort_order[idx]:
+						r = -r
+
+					if r > 0:
+						print("ERROR: incorrect ordering %s > %s" % (prev[idx], row[idx]))
+						sys.exit(1)
+
+					if r < 0:
+						break
+
+			prev = row
+
+	cur.close()
+
+
+def run_queries(conn, config, indexed_cols, num_queries = 1000):
+
+	nquery = 0
+
+	while nquery < num_queries:
+		if run_query(conn, config, indexed_cols):
+			nquery += 1
+
+
+def query_timing(cur, query):
+
+	runs = []
+
+	# get explain plan and costs from the first node
+	r = execute_query(cur, 'explain (analyze, timing off) %s' % (query,), True)
+	print("")
+	print("\n".join(['    ' + x[0] for x in r]))
+	print("")
+
+	sys.stdout.flush()
+
+	r = re.search('cost=([^\s]*)\.\.([^\s]*)', r[0][0])
+	costs = [float(r.groups()[0]), float(r.groups()[1])]
+
+	for r in range(0,1):
+		s = time.time()
+		execute_query(cur, query)
+		d = time.time()
+		runs.append(d-s)
+
+	# print("runs %s => mean %s" % (str(runs), mean(runs)))
+
+	sys.stdout.flush()
+
+	return (mean(runs), costs)
+
+
+def check_timing(conn, config, query):
+
+	cur = conn.cursor()
+
+	# get timing for a simple plan without a BRIN sort
+
+	execute_query(cur, 'set enable_seqscan = on')
+	execute_query(cur, 'set enable_brinsort = off')
+
+	(seqscan_time, seqscan_costs) = query_timing(cur, query)
+
+	# get timing for a simple plan with a BRIN sort
+	execute_query(cur, 'set enable_seqscan = off')
+	execute_query(cur, 'set enable_brinsort = on')
+	execute_query(cur, query)
+
+	(brinsort_time, brinsort_costs) = query_timing(cur, query)
+
+	print ("timing", 'rows', config['nrows'], 'pages_per_range', config['pages_per_range'], 'randomness', config['randomness'], 'fillfactor', config['fillfactor'], 'work_mem', config['work_mem'], 'watermark_step', config['watermark_step'], 'limit', config['limit'], 'offset', config['offset'], "seqscan", seqscan_time, "brinsort", brinsort_time, "costs seqscan", seqscan_costs[0], seqscan_costs[1], "brinsort", brinsort_costs[0], brinsort_costs[1])
+	# print ("brinsort timing", brinsort_time, "costs", brinsort_costs[0], brinsort_costs[1])
+
+	if (seqscan_costs[1] * 1.1 < brinsort_costs[1]) and (seqscan_time > brinsort_time * 1.1):
+		print ("COSTING ISSUE (%f < %f) && (%f > %f)" % (seqscan_costs[1], brinsort_costs[1], seqscan_time, brinsort_time))
+
+	if (seqscan_costs[1] > brinsort_costs[1] * 1.1) and (seqscan_time * 1.1 < brinsort_time):
+		print ("COSTING ISSUE (%f > %f) && (%f < %f)" % (seqscan_costs[1], brinsort_costs[1], seqscan_time, brinsort_time))
+
+	sys.stdout.flush()
+
+
+def run_query(conn, config, indexed_cols):
+
+	limit_rows = config['nrows']
+	offset_rows = 0
+
+	cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
+
+	# random columns to reference in the SELECT list, may not include sort column(s)
+	select_list = random.sample(cols, random.randint(1,len(cols)))
+
+	# but maybe just do select *, so that we don't do a projection
+	select_star = False
+	if random.random() < 0.5:
+		select_star = True
+		select_list = [('*', None)]
+
+	# random columns to reference in the ORDER BY clause
+	sort_list = random.sample(cols, random.randint(1,len(cols)))
+
+	# generate random ASC / DESC modifiers
+	is_desc = []
+	order_by = []
+	for s in range(0,len(sort_list)):
+		desc = random.choice([True, False])
+		is_desc.append(desc)
+		x = sort_list[s][0]
+		if desc:
+			x = x + ' DESC'
+		order_by.append(x)
+
+	query = 'SELECT %s FROM test_table ORDER BY %s' % (', '.join([v[0] for v in select_list]), ', '.join(order_by))
+
+	# randomly add LIMIT and OFFSET clause(s)
+	if random.random() < 0.5:
+
+		limit_rows = 1 + int(pow(random.random(), 3) * random.randint(1,config['nrows']))
+		query = query + ' LIMIT %d' % (limit_rows,);
+
+		if limit_rows < config['nrows'] and random.random() < 0.5:
+
+			offset_rows = int(pow(random.random(), 3) * random.randint(1,config['nrows'] - limit_rows))
+			query = query + ' OFFSET %d' % (offset_rows,);
+
+	expected_rows = min(limit_rows, config['nrows'] - offset_rows)
+
+	# watermark_step = random.randint(-1, 3)
+	watermark_step = random.choice([-1, 0, 1, 8, 32, 128])
+	execute_query(cur, 'SET brinsort_watermark_step = %d' % (watermark_step,))
+
+	f = random.random()
+	#work_mem_kb = 64 + int((f * f * f) * random.randint(64, 32768))
+	work_mem_kb = random.choice([64, 1024, 4096, 32768])
+
+	execute_query(cur, "SET work_mem = '%dkB'" % (work_mem_kb,))
+
+	config = config.copy()
+	config.update({'work_mem' : work_mem_kb})
+	config.update({'watermark_step' : watermark_step})
+	config.update({'limit' : limit_rows})
+	config.update({'offset' : offset_rows})
+
+	# do we expect brinsort or not? only when the first ORDER BY is indexed
+	if sort_list[0] in indexed_cols:
+
+		print('--------------', datetime.now(), '--------------')
+		print("SQL:", query)
+		print("CONFIG:", config)
+
+		if brinsort_in_explain(cur, query):
+			check_ordering(conn, config, query, expected_rows, select_star, select_list, sort_list, is_desc)
+			check_timing(conn, config, query)
+		else:
+			print("ERROR: BRIN Sort not in plan")
+			sys.exit(1)
+
+		result = True
+
+	else:
+
+		if brinsort_in_explain(cur, query):
+			print("ERROR: BRIN Sort in plan")
+			sys.exit(1)
+
+		result = False
+
+	cur.close()
+
+	return result
+
+
+def setup_connection(conn):
+	cur = conn.cursor()
+
+	# force index access
+	execute_query(cur, 'SET enable_seqscan = off')
+	execute_query(cur, 'SET max_parallel_workers_per_gather = 0')
+
+	cur.close()
+
+
+run_id = 0
+
+while True:
+
+	run_id += 1
+
+	config = {}
+
+	conn = psycopg2.connect('host=localhost port=5432 dbname=test user=user')
+
+	setup_connection(conn)
+
+	print('========== run %d ==========' % (run_id,))
+
+	# data distribution (1 - sequential, 3 - random)
+	config['randomness'] = random.choice([0, 0.05, 0.1, 0.25, 0.5, 1.0])
+
+	# random fillfactor, skewed closer to 10%
+	config['fillfactor'] = 10 + int(pow(random.random(),3) * 90)
+
+	# random number of rows
+	config['nrows'] = random.choice([100000, 1000000])
+
+	# pages per BRIN range (for all indexes)
+	config['pages_per_range'] = random.choice([1, 32, 128])
+
+	recreate_table(conn, config['nrows'], config['randomness'], config['fillfactor'])
+
+	indexed_cols = create_indexes(conn, config['pages_per_range'])
+
+	run_queries(conn, config, indexed_cols)
+
+	conn.close()
-- 
2.39.1

#26

Justin Pryzby

pryzby@telsasoft.com

almost 3 years ago

In reply to: Tomas Vondra (#25)

Re: PATCH: Using BRIN indexes for sorted output

On Thu, Feb 16, 2023 at 03:07:59PM +0100, Tomas Vondra wrote:

Rebased version of the patches, fixing only minor conflicts.

Per cfbot, the patch fails on 32 bit builds.
+ERROR: count mismatch: 0 != 1000

And causes warnings in mingw cross-compile.

On Sun, Oct 23, 2022 at 11:32:37PM -0500, Justin Pryzby wrote:

I think new GUCs should be enabled during patch development.
Maybe in a separate 0002 patch "for CI only not for commit".
That way "make check" at least has a chance to hit that new code
paths.

Also, note that indxpath.c had the var initialized to true.

In your patch, the amstats guc is still being set to false during
startup by the guc machinery. And the tests crash everywhere if it's
set to on:

TRAP: failed Assert("(nmatches_unique >= 1) && (nmatches_unique <= unique[nvalues-1])"), File: "../src/backend/access/brin/brin_minmax.c", Line: 644, PID: 25519

. Some typos in your other patches: "heuristics heuristics". ste.
lest (least).

These are still present.

--
Justin

#27

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Justin Pryzby (#26)

10 attachment(s)

Re: PATCH: Using BRIN indexes for sorted output

On 2/16/23 17:10, Justin Pryzby wrote:

On Thu, Feb 16, 2023 at 03:07:59PM +0100, Tomas Vondra wrote:

Rebased version of the patches, fixing only minor conflicts.

Per cfbot, the patch fails on 32 bit builds.
+ERROR: count mismatch: 0 != 1000

And causes warnings in mingw cross-compile.

There was a silly mistake in trying to store block numbers as bigint
when sorting the ranges, instead of uint32. That happens to work on
64-bit systems, but on 32-bit systems it produces bogus block.

The attached should fix that - it passes on 32-bit arm, even with
valgrind and all that.

On Sun, Oct 23, 2022 at 11:32:37PM -0500, Justin Pryzby wrote:

I think new GUCs should be enabled during patch development.
Maybe in a separate 0002 patch "for CI only not for commit".
That way "make check" at least has a chance to hit that new code
paths.

Also, note that indxpath.c had the var initialized to true.

In your patch, the amstats guc is still being set to false during
startup by the guc machinery. And the tests crash everywhere if it's
set to on:

TRAP: failed Assert("(nmatches_unique >= 1) && (nmatches_unique <= unique[nvalues-1])"), File: "../src/backend/access/brin/brin_minmax.c", Line: 644, PID: 25519

Right, that was a silly thinko in building the stats, and I found a
couple more issues nearby. Should be fixed in the attached version.

. Some typos in your other patches: "heuristics heuristics". ste.
lest (least).

These are still present.

Thanks for reminding me, those should be fixed too now.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Allow-index-AMs-to-build-and-use-custom-sta-20230218.patchtext/x-patch; charset=UTF-8; name=0001-Allow-index-AMs-to-build-and-use-custom-sta-20230218.patchDownload

From 7b3307c27b35ece119feab4891f03749250e454b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 17 Oct 2022 18:39:28 +0200
Subject: [PATCH 01/10] Allow index AMs to build and use custom statistics

Some indexing AMs work very differently and estimating them using
existing statistics is problematic, producing unreliable costing. This
applies e.g. to BRIN, which relies on page ranges, not tuple pointers.

This adds an optional AM procedure, allowing the opfamily to build
custom statistics, store them in pg_statistic and then use them during
planning. By default this is disabled, but may be enabled by setting

   SET enable_indexam_stats = true;

Then ANALYZE will call the optional procedure for all indexes.
---
 src/backend/access/brin/brin.c                |   1 +
 src/backend/access/brin/brin_minmax.c         | 896 ++++++++++++++++++
 src/backend/commands/analyze.c                | 149 ++-
 src/backend/statistics/extended_stats.c       |   2 +
 src/backend/utils/adt/selfuncs.c              |  59 ++
 src/backend/utils/cache/lsyscache.c           |  41 +
 src/backend/utils/misc/guc_tables.c           |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h                    |   2 +
 src/include/access/brin.h                     |  63 ++
 src/include/access/brin_internal.h            |   1 +
 src/include/catalog/pg_amproc.dat             |  64 ++
 src/include/catalog/pg_proc.dat               |   4 +
 src/include/catalog/pg_statistic.h            |   5 +
 src/include/commands/vacuum.h                 |   2 +
 src/include/utils/lsyscache.h                 |   1 +
 src/test/regress/expected/sysviews.out        |   3 +-
 17 files changed, 1298 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index de1427a1e0e..e8bf20a6bae 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -95,6 +95,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->amstrategies = 0;
 	amroutine->amsupport = BRIN_LAST_OPTIONAL_PROCNUM;
 	amroutine->amoptsprocnum = BRIN_PROCNUM_OPTIONS;
+	amroutine->amstatsprocnum = BRIN_PROCNUM_STATISTICS;
 	amroutine->amcanorder = false;
 	amroutine->amcanorderbyop = false;
 	amroutine->amcanbackward = false;
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 2431591be65..6b0f303504d 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -10,17 +10,23 @@
  */
 #include "postgres.h"
 
+#include "access/brin.h"
 #include "access/brin_internal.h"
+#include "access/brin_revmap.h"
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
+#include "executor/executor.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 typedef struct MinmaxOpaque
 {
@@ -253,6 +259,896 @@ brin_minmax_union(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/* FIXME copy of a private struct from brin.c */
+typedef struct BrinOpaque
+{
+	BlockNumber bo_pagesPerRange;
+	BrinRevmap *bo_rmAccess;
+	BrinDesc   *bo_bdesc;
+} BrinOpaque;
+
+/*
+ * Compare ranges by minval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_minval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->min_value, rb->min_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/*
+ * Compare ranges by maxval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_maxval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->max_value, rb->max_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/* compare values using an operator from typcache */
+static int
+range_values_cmp(const void *a, const void *b, void *arg)
+{
+	Datum	da = * (Datum *) a;
+	Datum	db = * (Datum *) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  da, db);
+	return DatumGetInt32(c);
+}
+
+/*
+ * minval_end
+ *		Determine first index so that (minval > value).
+ *
+ * The array of ranges is expected to be sorted by minvalue, so this is the first
+ * range that can't possibly intersect with a range having "value" as maxval.
+ */
+static int
+minval_end(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[end]->min_value, typcache) >= 0)
+		return nranges;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[start]->min_value, typcache) < 0)
+		return 0;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->min_value, typcache);
+
+		if (r >= 0)
+			start = midpoint + 1;
+		else
+			end = midpoint;
+	}
+
+	Assert(ranges[start]->min_value > value);
+	Assert(ranges[start-1]->min_value <= value);
+
+	return start;
+}
+
+
+/*
+ * lower_bound
+ *		Determine first index so that (values[index] >= value).
+ *
+ * The array of values is sorted, and this returns the first value that
+ * exceeds (or is equal) to the minvalue.
+ */
+static int
+lower_bound(Datum *values, int nvalues, Datum minvalue, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	/* all values exceed minvalue - return the first element */
+	if (range_values_cmp(&minvalue, &values[start], typcache) <= 0)
+		return 0;
+
+	/* nothing matches - return the element after the last one */
+	if (range_values_cmp(&minvalue, &values[end], typcache) > 0)
+		return nvalues;
+
+	/*
+	 * Now we know the lower boundary is somewhere in the array (and we know
+	 * it's not the first element, because that's covered by the first check
+	 * above). So do a binary search.
+	 */
+	while ((end - start) > 0)
+	{
+		int	midpoint;
+		int	r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&minvalue, &values[midpoint], typcache);
+
+		if (r <= 0)	/* minvalue >= midpoint */
+			end = midpoint;
+		else		/* midpoint < minvalue */
+			start = (midpoint + 1);
+	}
+
+	Assert(range_values_cmp(&minvalue, &values[start], typcache) <= 0);
+	Assert(range_values_cmp(&minvalue, &values[start-1], typcache) > 0);
+
+	return start;
+}
+
+/*
+ * upper_bound
+ *		Determine last index so that (values[index] <= maxvalue).
+ *
+ * The array of values is sorted, and this returns the last value that
+ * does not exceed (or is equal) to the maxvalue.
+ */
+static int
+upper_bound(Datum *values, int nvalues, Datum maxvalue, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	/* everything matches, return the last element */
+	if (range_values_cmp(&values[end], &maxvalue, typcache) <= 0)
+		return (nvalues - 1);
+
+	/* nothing matches, return the element before the first one */
+	if (range_values_cmp(&values[start], &maxvalue, typcache) > 0)
+		return -1;
+
+	/*
+	 * Now we know the lower boundary is somewhere in the array (and we know
+	 * it's not the last element, because that's covered by the first check
+	 * above). So do a binary search.
+	 */
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		/* Ensure we always move (it might be equal to start due to rounding). */
+		midpoint = Max(start+1, midpoint);
+
+		r = range_values_cmp(&values[midpoint], &maxvalue, typcache);
+
+		if (r <= 0)			/* value <= maxvalue */
+			start = midpoint;
+		else				/* value > maxvalue */
+			end = midpoint - 1;
+	}
+
+	Assert(range_values_cmp(&values[start], &maxvalue, typcache) <= 0);
+	Assert(range_values_cmp(&values[start+1], &maxvalue, typcache) > 0);
+
+	return start;
+}
+
+/*
+ * brin_minmax_count_overlaps
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges to quickly eliminate ranges that can't possibly
+ * intersect. We simply walk minranges until minval > current maxval, and
+ * we're done.
+ *
+ * Unlike brin_minmax_count_overlaps2, this does not have issues with wide
+ * ranges, so this is what we should use.
+ */
+static void
+brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
+						   TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	int64	noverlaps;
+
+	noverlaps = 0;
+	for (int i = 0; i < nranges; i++)
+	{
+		Datum	maxval = minranges[i]->max_value;
+
+		/*
+		 * Determine index of the first range with (minval > current maxval)
+		 * by binary search. We know all other ranges can't overlap the
+		 * current one. We simply subtract indexes to count ranges.
+		 */
+		int		idx = minval_end(minranges, nranges, maxval, typcache);
+
+		/* -1 because we don't count the range as intersecting with itself */
+		noverlaps += (idx - i - 1);
+	}
+
+	/*
+	 * We only count 1/2 the ranges (minval > current minval), so the total
+	 * number of overlaps is twice what we counted.
+	 */
+	noverlaps *= 2;
+
+	stats->avg_overlaps = (double) noverlaps / nranges;
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Alternative to brin_minmax_match_tuples_to_ranges2, leveraging ordering
+ * of values, not ranges.
+ *
+ * XXX This seems like the optimal way to do this.
+ */
+static void
+brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
+								   int numrows, HeapTuple *rows,
+								   int nvalues, Datum *values,
+								   TypeCacheEntry *typcache,
+								   BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nvalues_unique = 0;
+
+	int64  *unique = (int64 *) palloc0(sizeof(int64) * nvalues);
+
+	/*
+	 * Build running count of unique values. We know there are unique[i]
+	 * unique values in values array up to index "i".
+	 */
+	unique[0] = 1;
+	for (int i = 1; i < nvalues; i++)
+	{
+		if (range_values_cmp(&values[i-1], &values[i], typcache) == 0)
+			unique[i] = unique[i-1];
+		else
+			unique[i] = unique[i-1] + 1;
+	}
+
+	nvalues_unique = unique[nvalues-1];
+
+	/*
+	 * Walk the ranges, for each range determine the first/last mapping
+	 * value. Use the "unique" array to count the unique values.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int		start,
+				end,
+				nvalues_match,
+				nunique_match;
+
+		CHECK_FOR_INTERRUPTS();
+
+		start = lower_bound(values, nvalues, ranges->ranges[i].min_value, typcache);
+		end = upper_bound(values, nvalues, ranges->ranges[i].max_value, typcache);
+
+		/* if nothing matches (e.g. end=0), skip this range */
+		if (end <= start)
+			continue;
+
+		nvalues_match = (end - start + 1);
+		nunique_match = (unique[end] - unique[start] + 1);
+
+		Assert((nvalues_match >= 1) && (nvalues_match <= nvalues));
+		Assert((nunique_match >= 1) && (nunique_match <= unique[nvalues-1]));
+
+		nmatches += nvalues_match;
+		nmatches_unique += nunique_match;
+	}
+
+	Assert(nmatches >= 0);
+	Assert(nmatches_unique >= 0);
+
+	stats->avg_matches = (double) nmatches / numrows;
+	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
+}
+
+/*
+ * brin_minmax_value_stats
+ *		Calculate statistics about minval/maxval values.
+ *
+ * We calculate the number of distinct values, and also correlation with respect
+ * to blkno_start. We don't calculate the regular correlation coefficient, because
+ * our goal is to estimate how sequential the accesses are. The regular correlation
+ * would produce 0 for cyclical data sets like mod(i,1000000), but it may be quite
+ * sequantial access. Maybe it should be called differently, not correlation?
+ *
+ * XXX Maybe this should calculate minval vs. maxval correlation too?
+ *
+ * XXX I don't know how important the sequentiality is - BRIN generally uses 1MB
+ * page ranges, which is pretty sequential and the one random seek in between is
+ * likely going to be negligible. Maybe for small page ranges it'll matter, though.
+ */
+static void
+brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
+						int nranges, TypeCacheEntry *typcache,
+						BrinMinmaxStats *stats)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1,
+			minval_corr = 0,
+			maxval_corr = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+			minval_ndist++;
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+			maxval_ndist++;
+
+		/* is it immediately sequential? */
+		if (minranges[i-1]->blkno_end + 1 == minranges[i]->blkno_start)
+			minval_corr++;
+
+		/* is it immediately sequential? */
+		if (maxranges[i-1]->blkno_end + 1 == maxranges[i]->blkno_start)
+			maxval_corr++;
+	}
+
+	stats->minval_ndistinct = minval_ndist;
+	stats->maxval_ndistinct = maxval_ndist;
+
+	stats->minval_correlation = (double) minval_corr / nranges;
+	stats->maxval_correlation = (double) maxval_corr / nranges;
+}
+
+/*
+ * brin_minmax_increment_stats
+ *		Calculate the increment size for minval/maxval steps.
+ *
+ * Calculates the minval/maxval increment size, i.e. number of rows that need
+ * to be added to the sort. This serves as an input to calculation of a good
+ * watermark step.
+ */
+static void
+brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
+							int nranges, Datum *values, int nvalues,
+							TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1;
+
+	double	sum_minval = 0,
+			sum_maxval = 0,
+			max_minval = 0,
+			max_maxval = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+		{
+			double	p;
+			int		start = upper_bound(values, nvalues, minranges[i-1]->min_value, typcache);
+			int		end = upper_bound(values, nvalues, minranges[i]->min_value, typcache);
+
+			/*
+			 * Maybe there are no matching rows, but we still need to count
+			 * this as distinct minval (even though the sample increase is 0).
+			 */
+			minval_ndist++;
+
+			Assert(end >= start);
+
+			/* no sample rows match this, so skip */
+			if (end == start)
+				continue;
+
+			p = (double) (end - start) / nvalues;
+
+			max_minval = Max(max_minval, p);
+			sum_minval += p;
+		}
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+		{
+			double	p;
+			int		start = upper_bound(values, nvalues, maxranges[i-1]->max_value, typcache);
+			int		end = upper_bound(values, nvalues, maxranges[i]->max_value, typcache);
+
+			/*
+			 * Maybe there are no matching rows, but we still need to count
+			 * this as distinct maxval (even though the sample increase is 0).
+			 */
+			maxval_ndist++;
+
+			Assert(end >= start);
+
+			/* no sample rows match this, so skip */
+			if (end == start)
+				continue;
+
+			p = (double) (end - start) / nvalues;
+
+			max_maxval = Max(max_maxval, p);
+			sum_maxval += p;
+		}
+	}
+
+	stats->minval_increment_avg = (sum_minval / minval_ndist);
+	stats->minval_increment_max = max_minval;
+
+	stats->maxval_increment_avg = (sum_maxval / maxval_ndist);
+	stats->maxval_increment_max = max_maxval;
+}
+
+/*
+ * brin_minmax_stats
+ *		Calculate custom statistics for a BRIN minmax index.
+ *
+ * At the moment this calculates:
+ *
+ *  - number of summarized/not-summarized and all/has nulls ranges
+ *  - average number of overlaps for a range
+ *  - average number of rows matching a range
+ *  - number of distinct minval/maxval values
+ *
+ * XXX This could also calculate correlation of the range minval, so that
+ * we can estimate how much random I/O will happen during the BrinSort.
+ * And perhaps we should also sort the ranges by (minval,block_start) to
+ * make this as sequential as possible?
+ *
+ * XXX Another interesting statistics might be the number of ranges with
+ * the same minval (or number of distinct minval values), because that's
+ * essentially what we need to estimate how many ranges will be read in
+ * one brinsort step. In fact, knowing the number of distinct minval
+ * values tells us the number of BrinSort loops.
+ *
+ * XXX We might also calculate a histogram of minval/maxval values.
+ *
+ * XXX I wonder if we could track for each range track probabilities:
+ *
+ * - P1 = P(v <= minval)
+ * - P2 = P(x <= Max(maxval)) for Max(maxval) over preceding ranges
+ *
+ * That would allow us to estimate how many ranges we'll have to read to produce
+ * a particular number of rows, because we need the first probability to exceed
+ * the requested number of rows (fraction of the table):
+ *
+ *     (limit rows / reltuples) <= P(v <= minval)
+ *
+ * and then the second probability would say how many rows we'll process (either
+ * sort or spill). And inversely for the DESC ordering.
+ *
+ * The difference between P1 for two ranges is how much we'd have to sort
+ * if we moved the watermark between the ranges (first minval to second one).
+ * The (P2 - P1) for the new watermark range measures the number of rows in
+ * the tuplestore. We'll need to aggregate this, though, we can't keep the
+ * whole data - probably average/median/max for the differences would be nice.
+ * Might be tricky for different watermark step values, though.
+ *
+ * This would also allow estimating how many rows will spill from each range,
+ * because we have an estimate how many rows match a range on average, and
+ * we can compare it to the difference between P1.
+ *
+ * One issue is we don't have actual tuples from the ranges, so we can't
+ * measure exactly how many rows would we add. But we can match the sample
+ * and at least estimate the the probability difference.
+ *
+ * Actually - we do know the tuples *are* in those ranges, because if we
+ * assume the tuple is in some other range, that range would have to have
+ * a minimal/maximal value so that the value is consistent. Which means
+ * the range has to be between those ranges. Of course, this only estimates
+ * the rows we'd going to add to the tuplesort - there might be more rows
+ * we read and spill to tuplestore, but that's something we can estimate
+ * using average tuples per range.
+ */
+Datum
+brin_minmax_stats(PG_FUNCTION_ARGS)
+{
+	Relation		heapRel = (Relation) PG_GETARG_POINTER(0);
+	Relation		indexRel = (Relation) PG_GETARG_POINTER(1);
+	AttrNumber		attnum = PG_GETARG_INT16(2);	/* index attnum */
+	AttrNumber		heap_attnum = PG_GETARG_INT16(3);
+	Expr		   *expr = (Expr *) PG_GETARG_POINTER(4);
+	HeapTuple	   *rows = (HeapTuple *) PG_GETARG_POINTER(5);
+	int				numrows = PG_GETARG_INT32(6);
+
+	BrinOpaque *opaque;
+	BlockNumber nblocks;
+	BlockNumber	nranges;
+	BlockNumber	heapBlk;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BrinRanges  *ranges;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinMinmaxStats *stats;
+
+	Oid				typoid;
+	TypeCacheEntry *typcache;
+	BrinRange	  **minranges,
+				  **maxranges;
+	int64			prev_min_index;
+
+	/* expression stats */
+	EState	   *estate;
+	ExprContext *econtext;
+	ExprState  *exprstate;
+	TupleTableSlot *slot;
+
+	/* attnum or expression has to be supplied */
+	Assert(AttributeNumberIsValid(heap_attnum) || (expr != NULL));
+
+	/* but not both of them at the same time */
+	Assert(!(AttributeNumberIsValid(heap_attnum) && (expr != NULL)));
+
+	/*
+	 * Mostly what brinbeginscan does to initialize BrinOpaque, except that
+	 * we use active snapshot instead of the scan snapshot.
+	 */
+	opaque = palloc_object(BrinOpaque);
+	opaque->bo_rmAccess = brinRevmapInitialize(indexRel,
+											   &opaque->bo_pagesPerRange,
+											   GetActiveSnapshot());
+	opaque->bo_bdesc = brin_build_desc(indexRel);
+
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+
+	/*
+	 * How many ranges can there be? We simply look at the number of pages,
+	 * divide it by the pages_per_range.
+	 *
+	 * XXX We need to be careful not to overflow nranges, so we just divide
+	 * and then maybe add 1 for partial ranges.
+	 */
+	nranges = (nblocks / pagesPerRange);
+	if (nblocks % pagesPerRange != 0)
+		nranges += 1;
+
+	/* allocate for space, and also for the alternative ordering */
+	ranges = palloc0(offsetof(BrinRanges, ranges) + nranges * sizeof(BrinRange));
+	ranges->nranges = 0;
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* result stats */
+	stats = palloc0(sizeof(BrinMinmaxStats));
+	SET_VARSIZE(stats, sizeof(BrinMinmaxStats));
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX We count the ranges, and count the special types (not summarized,
+	 * all-null and has-null). The regular ranges are accumulated into an
+	 * array, so that we can calculate additional statistics (overlaps, hits
+	 * for sample tuples, etc).
+	 *
+	 * XXX This needs rethinking to make it work with large indexes with more
+	 * ranges than we can fit into memory (work_mem/maintenance_work_mem).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		stats->n_ranges++;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   GetActiveSnapshot());
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/* Ranges with no indexed tuple are ignored for overlap analysis. */
+		if (!gottuple)
+		{
+			continue;
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/* Placeholders can be ignored too, as if not summarized. */
+				continue;
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				/* OK this range is summarized */
+				stats->n_summarized++;
+
+				if (bval->bv_allnulls)
+					stats->n_all_nulls++;
+
+				if (bval->bv_hasnulls)
+					stats->n_has_nulls++;
+
+				if (!bval->bv_allnulls)
+				{
+					BrinRange  *range;
+
+					range = &ranges->ranges[ranges->nranges++];
+
+					range->blkno_start = heapBlk;
+					range->blkno_end = heapBlk + (pagesPerRange - 1);
+
+					range->min_value = bval->bv_values[0];
+					range->max_value = bval->bv_values[1];
+				}
+			}
+		}
+	}
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* if we have no regular ranges, we're done */
+	if (ranges->nranges == 0)
+		goto cleanup;
+
+	/*
+	 * Build auxiliary info to optimize the calculation.
+	 *
+	 * We have ranges in the blocknum order, but that is not very useful when
+	 * calculating which ranges interstect - we could cross-check every range
+	 * against every other range, but that's O(N^2) and thus may get extremely
+	 * expensive pretty quick).
+	 *
+	 * To make that cheaper, we'll build two orderings, allowing us to quickly
+	 * eliminate ranges that can't possibly overlap:
+	 *
+	 * - minranges = ranges ordered by min_value
+	 * - maxranges = ranges ordered by max_value
+	 *
+	 * To count intersections, we'll then walk maxranges (i.e. ranges ordered
+	 * by maxval), and for each following range we'll check if it overlaps.
+	 * If yes, we'll proceed to the next one, until we find a range that does
+	 * not overlap. But there might be a later page overlapping - but we can
+	 * use a min_index_lowest tracking the minimum min_index for "future"
+	 * ranges to quickly decide if there are such ranges. If there are none,
+	 * we can terminate (and proceed to the next maxranges element), else we
+	 * have to process additional ranges.
+	 *
+	 * Note: This only counts overlaps with ranges with max_value higher than
+	 * the current one - we want to count all, but the overlaps with preceding
+	 * ranges have already been counted when processing those preceding ranges.
+	 * That is, we'll end up with counting each overlap just for one of those
+	 * ranges, so we get only 1/2 the count.
+	 *
+	 * Note: We don't count the range as overlapping with itself. This needs
+	 * to be considered later, when applying the statistics.
+	 *
+	 *
+	 * XXX This will not work for very many ranges - we can have up to 2^32 of
+	 * them, so allocating a ~32B struct for each would need a lot of memory.
+	 * Not sure what to do about that, perhaps we could sample a couple ranges
+	 * and do some calculations based on that? That is, we could process all
+	 * ranges up to some number (say, statistics_target * 300, as for rows), and
+	 * then sample ranges for larger tables. Then sort the sampled ranges, and
+	 * walk through all ranges once, comparing them to the sample and counting
+	 * overlaps (having them sorted should allow making this quite efficient,
+	 * I think - following algorithm similar to the one implemented here).
+	 */
+
+	/* info about ordering for the data type */
+	typoid = get_atttype(RelationGetRelid(indexRel), attnum);
+	typcache = lookup_type_cache(typoid, TYPECACHE_CMP_PROC_FINFO);
+
+	/* shouldn't happen, I think - we use this to build the index */
+	Assert(OidIsValid(typcache->cmp_proc_finfo.fn_oid));
+
+	minranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+	maxranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+
+	/*
+	 * Build and sort the ranges min_value / max_value (just pointers
+	 * to the main array). Then go and assign the min_index to each
+	 * range, and finally walk the maxranges array backwards and track
+	 * the min_index_lowest as minimum of "future" indexes.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		minranges[i] = &ranges->ranges[i];
+		maxranges[i] = &ranges->ranges[i];
+	}
+
+	qsort_arg(minranges, ranges->nranges, sizeof(BrinRange *),
+			  range_minval_cmp, typcache);
+
+	qsort_arg(maxranges, ranges->nranges, sizeof(BrinRange *),
+			  range_maxval_cmp, typcache);
+
+	/*
+	 * Update the min_index for each range. If the values are equal, be sure to
+	 * pick the lowest index with that min_value.
+	 */
+	minranges[0]->min_index = 0;
+	for (int i = 1; i < ranges->nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i]->min_value, &minranges[i-1]->min_value, typcache) == 0)
+			minranges[i]->min_index = minranges[i-1]->min_index;
+		else
+			minranges[i]->min_index = i;
+	}
+
+	/*
+	 * Walk the maxranges backward and assign the min_index_lowest as
+	 * a running minimum.
+	 */
+	prev_min_index = ranges->nranges;
+	for (int i = (ranges->nranges - 1); i >= 0; i--)
+	{
+		maxranges[i]->min_index_lowest = Min(maxranges[i]->min_index,
+											 prev_min_index);
+		prev_min_index = maxranges[i]->min_index_lowest;
+	}
+
+	/* calculate average number of overlapping ranges for any range */
+	brin_minmax_count_overlaps(minranges, ranges->nranges, typcache, stats);
+
+	/* calculate minval/maxval stats (distinct values and correlation) */
+	brin_minmax_value_stats(minranges, maxranges,
+							ranges->nranges, typcache, stats);
+
+	/*
+	 * If processing expression, prepare context to evaluate it.
+	 *
+	 * XXX cleanup / refactoring needed
+	 */
+	if (expr)
+	{
+		estate = CreateExecutorState();
+		econtext = GetPerTupleExprContext(estate);
+
+		/* Need a slot to hold the current heap tuple, too */
+		slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRel),
+										&TTSOpsHeapTuple);
+
+		/* Arrange for econtext's scan tuple to be the tuple under test */
+		econtext->ecxt_scantuple = slot;
+
+		exprstate = ExecPrepareExpr(expr, estate);
+	}
+
+	/* match tuples to ranges */
+	{
+		int		nvalues = 0;
+		Datum  *values = (Datum *) palloc0(numrows * sizeof(Datum));
+
+		TupleDesc	tdesc = RelationGetDescr(heapRel);
+
+		for (int i = 0; i < numrows; i++)
+		{
+			bool	isnull;
+			Datum	value;
+
+			if (!expr)
+				value = heap_getattr(rows[i], heap_attnum, tdesc, &isnull);
+			else
+			{
+				/*
+				 * Reset the per-tuple context each time, to reclaim any cruft
+				 * left behind by evaluating the predicate or index expressions.
+				 */
+				ResetExprContext(econtext);
+
+				/* Set up for predicate or expression evaluation */
+				ExecStoreHeapTuple(rows[i], slot, false);
+
+				value = ExecEvalExpr(exprstate,
+									 GetPerTupleExprContext(estate),
+									 &isnull);
+			}
+
+			if (!isnull)
+				values[nvalues++] = value;
+		}
+
+		qsort_arg(values, nvalues, sizeof(Datum), range_values_cmp, typcache);
+
+		/* optimized algorithm */
+		brin_minmax_match_tuples_to_ranges(ranges,
+										   numrows, rows, nvalues, values,
+										   typcache, stats);
+
+		brin_minmax_increment_stats(minranges, maxranges, ranges->nranges,
+									values, nvalues, typcache, stats);
+	}
+
+	/* XXX cleanup / refactoring needed */
+	if (expr)
+	{
+		ExecDropSingleTupleTableSlot(slot);
+		FreeExecutorState(estate);
+	}
+
+	/*
+	 * Possibly quite large, so release explicitly and don't rely
+	 * on the memory context to discard this.
+	 */
+	pfree(minranges);
+	pfree(maxranges);
+
+cleanup:
+	/* possibly quite large, so release explicitly */
+	pfree(ranges);
+
+	/* free the BrinOpaque, just like brinendscan() would */
+	brinRevmapTerminate(opaque->bo_rmAccess);
+	brin_free_desc(opaque->bo_bdesc);
+
+	PG_RETURN_POINTER(stats);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 65750958bb2..984a7f85cda 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -16,6 +16,7 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
 #include "access/detoast.h"
 #include "access/genam.h"
 #include "access/multixact.h"
@@ -30,6 +31,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_namespace.h"
@@ -81,6 +83,7 @@ typedef struct AnlIndexData
 
 /* Default statistics target (GUC parameter) */
 int			default_statistics_target = 100;
+bool		enable_indexam_stats = false;
 
 /* A few variables that don't seem worth passing around as parameters */
 static MemoryContext anl_context = NULL;
@@ -92,7 +95,7 @@ static void do_analyze_rel(Relation onerel,
 						   AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
 						   bool inh, bool in_outer_xact, int elevel);
 static void compute_index_stats(Relation onerel, double totalrows,
-								AnlIndexData *indexdata, int nindexes,
+								AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 								HeapTuple *rows, int numrows,
 								MemoryContext col_context);
 static VacAttrStats *examine_attribute(Relation onerel, int attnum,
@@ -453,15 +456,49 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		{
 			AnlIndexData *thisdata = &indexdata[ind];
 			IndexInfo  *indexInfo;
+			bool		collectAmStats;
+			Oid			regproc;
 
 			thisdata->indexInfo = indexInfo = BuildIndexInfo(Irel[ind]);
 			thisdata->tupleFract = 1.0; /* fix later if partial */
-			if (indexInfo->ii_Expressions != NIL && va_cols == NIL)
+
+			/*
+			 * Should we collect AM-specific statistics for any of the columns?
+			 *
+			 * If AM-specific statistics are enabled (using a GUC), see if we
+			 * have an optional support procedure to build the statistics.
+			 *
+			 * If there's any such attribute, we just force building stats
+			 * even for regular index keys (not just expressions) and indexes
+			 * without predicates. It'd be good to only build the AM stats, but
+			 * for now this is good enough.
+			 *
+			 * XXX The GUC is there morestly to make it easier to enable/disable
+			 * this during development.
+			 *
+			 * FIXME Only build the AM statistics, not the other stats. And only
+			 * do that for the keys with the optional procedure. not all of them.
+			 */
+			collectAmStats = false;
+			if (enable_indexam_stats && (Irel[ind]->rd_indam->amstatsprocnum != 0))
+			{
+				for (int j = 0; j < indexInfo->ii_NumIndexAttrs; j++)
+				{
+					regproc = index_getprocid(Irel[ind], (j+1), Irel[ind]->rd_indam->amstatsprocnum);
+					if (OidIsValid(regproc))
+					{
+						collectAmStats = true;
+						break;
+					}
+				}
+			}
+
+			if ((indexInfo->ii_Expressions != NIL || collectAmStats) && va_cols == NIL)
 			{
 				ListCell   *indexpr_item = list_head(indexInfo->ii_Expressions);
 
 				thisdata->vacattrstats = (VacAttrStats **)
-					palloc(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
+					palloc0(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
 				tcnt = 0;
 				for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
 				{
@@ -482,6 +519,12 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 						if (thisdata->vacattrstats[tcnt] != NULL)
 							tcnt++;
 					}
+					else
+					{
+						thisdata->vacattrstats[tcnt] =
+							examine_attribute(Irel[ind], i + 1, NULL);
+						tcnt++;
+					}
 				}
 				thisdata->attr_cnt = tcnt;
 			}
@@ -587,7 +630,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 		if (nindexes > 0)
 			compute_index_stats(onerel, totalrows,
-								indexdata, nindexes,
+								indexdata, Irel, nindexes,
 								rows, numrows,
 								col_context);
 
@@ -821,12 +864,93 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	anl_context = NULL;
 }
 
+/*
+ * compute_indexam_stats
+ *		Call the optional procedure to compute AM-specific statistics.
+ *
+ * We simply call the procedure, which is expected to produce a bytea value.
+ *
+ * At the moment this only deals with BRIN indexes, and bails out for other
+ * access methods, but it should be generic - use something like amoptsprocnum
+ * and just check if the procedure exists.
+ */
+static void
+compute_indexam_stats(Relation onerel,
+					  Relation indexRel, IndexInfo *indexInfo,
+					  double totalrows, AnlIndexData *indexdata,
+					  HeapTuple *rows, int numrows)
+{
+	int		expridx;
+
+	if (!enable_indexam_stats)
+		return;
+
+	/* ignore index AMs without the optional procedure */
+	if (indexRel->rd_indam->amstatsprocnum == 0)
+		return;
+
+	/*
+	 * Look at attributes, and calculate stats for those that have the
+	 * optional stats proc for the opfamily.
+	 */
+	expridx = 0;
+	for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+	{
+		AttrNumber		attno = (i + 1);
+		AttrNumber		attnum = indexInfo->ii_IndexAttrNumbers[i];	/* heap attnum */
+		RegProcedure	regproc;
+		FmgrInfo	   *statsproc;
+		Datum			datum;
+		VacAttrStats   *stats;
+		MemoryContext	oldcxt;
+		Node		   *expr = NULL;
+
+		if (!AttributeNumberIsValid(attnum))
+		{
+			expr = (Node *) list_nth(RelationGetIndexExpressions(indexRel),
+									 expridx);
+			expridx++;
+		}
+
+		/* do this first, as it doesn't fail when proc not defined */
+		regproc = index_getprocid(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+
+		/* ignore opclasses without the optional procedure */
+		if (!RegProcedureIsValid(regproc))
+			continue;
+
+		statsproc = index_getprocinfo(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+		Assert(statsproc != NULL);
+
+		stats = indexdata->vacattrstats[i];
+
+		oldcxt = MemoryContextSwitchTo(stats->anl_context);
+
+		/* call the proc, let the AM calculate whatever it wants */
+		/* XXX maybe we should just pass the index attno and leave the
+		 * expression handling up to the procedure? */
+		datum = FunctionCall7Coll(statsproc,
+								  InvalidOid, /* FIXME correct collation */
+								  PointerGetDatum(onerel),
+								  PointerGetDatum(indexRel),
+								  Int16GetDatum(attno),
+								  Int16GetDatum(attnum),
+								  PointerGetDatum(expr),
+								  PointerGetDatum(rows),
+								  Int32GetDatum(numrows));
+
+		stats->staindexam = datum;
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+}
+
 /*
  * Compute statistics about indexes of a relation
  */
 static void
 compute_index_stats(Relation onerel, double totalrows,
-					AnlIndexData *indexdata, int nindexes,
+					AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 					HeapTuple *rows, int numrows,
 					MemoryContext col_context)
 {
@@ -846,6 +970,7 @@ compute_index_stats(Relation onerel, double totalrows,
 	{
 		AnlIndexData *thisdata = &indexdata[ind];
 		IndexInfo  *indexInfo = thisdata->indexInfo;
+		Relation	indexRel = indexRels[ind];
 		int			attr_cnt = thisdata->attr_cnt;
 		TupleTableSlot *slot;
 		EState	   *estate;
@@ -858,6 +983,13 @@ compute_index_stats(Relation onerel, double totalrows,
 					rowno;
 		double		totalindexrows;
 
+		/*
+		 * If this is a BRIN index, try calling a procedure to collect
+		 * extra opfamily-specific statistics (if procedure defined).
+		 */
+		compute_indexam_stats(onerel, indexRel, indexInfo, totalrows,
+							  thisdata, rows, numrows);
+
 		/* Ignore index if no columns to analyze and not partial */
 		if (attr_cnt == 0 && indexInfo->ii_Predicate == NIL)
 			continue;
@@ -1661,6 +1793,13 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+
+		/* optional AM-specific stats */
+		if (DatumGetPointer(stats->staindexam) != NULL)
+			values[Anum_pg_statistic_staindexam - 1] = stats->staindexam;
+		else
+			nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 572d9b44643..97fee77ea57 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -2370,6 +2370,8 @@ serialize_expr_stats(AnlExprData *exprdata, int nexprs)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+		nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index fe37e65af03..cc2f3ef012a 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7834,6 +7834,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	Relation	indexRel;
 	ListCell   *l;
 	VariableStatData vardata;
+	double		averageOverlaps;
 
 	Assert(rte->rtekind == RTE_RELATION);
 
@@ -7881,6 +7882,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 * correlation statistics, we will keep it as 0.
 	 */
 	*indexCorrelation = 0;
+	averageOverlaps = 0.0;
 
 	foreach(l, path->indexclauses)
 	{
@@ -7890,6 +7892,36 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		/* attempt to lookup stats in relation for this index column */
 		if (attnum != 0)
 		{
+			/*
+			 * If AM-specific statistics are enabled, try looking up the stats
+			 * for the index key. We only have this for minmax opclasses, so
+			 * we just cast it like that. But other BRIN opclasses might need
+			 * other stats so either we need to abstract this somehow, or maybe
+			 * just collect a sufficiently generic stats for all BRIN indexes.
+			 *
+			 * XXX Make this non-minmax specific.
+			 */
+			if (enable_indexam_stats)
+			{
+				BrinMinmaxStats  *amstats
+					= (BrinMinmaxStats *) get_attindexam(index->indexoid, attnum);
+
+				if (amstats)
+				{
+					elog(DEBUG1, "found AM stats: attnum %d n_ranges %lld n_summarized %lld n_all_nulls %lld n_has_nulls %lld avg_overlaps %f",
+						 attnum, (long long)amstats->n_ranges, (long long)amstats->n_summarized,
+						 (long long)amstats->n_all_nulls, (long long)amstats->n_has_nulls,
+						 amstats->avg_overlaps);
+
+					/*
+					 * The only thing we use at the moment is the average number
+					 * of overlaps for a single range. Use the other stuff too.
+					 */
+					averageOverlaps = Max(averageOverlaps,
+										  1.0 + amstats->avg_overlaps);
+				}
+			}
+
 			/* Simple variable -- look to stats for the underlying table */
 			if (get_relation_stats_hook &&
 				(*get_relation_stats_hook) (root, rte, attnum, &vardata))
@@ -7970,6 +8002,14 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 											 baserel->relid,
 											 JOIN_INNER, NULL);
 
+	/*
+	 * XXX Can we combine qualSelectivity with the average number of matching
+	 * ranges per value? qualSelectivity estimates how many tuples ar we
+	 * going to match, and average number of matches says how many ranges
+	 * will each of those match on average. We don't know how many will
+	 * be duplicate, but it gives us a worst-case estimate, at least.
+	 */
+
 	/*
 	 * Now calculate the minimum possible ranges we could match with if all of
 	 * the rows were in the perfect order in the table's heap.
@@ -7986,6 +8026,25 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	else
 		estimatedRanges = Min(minimalRanges / *indexCorrelation, indexRanges);
 
+	elog(DEBUG1, "before index AM stats: cestimatedRanges = %f", estimatedRanges);
+
+	/*
+	 * If we found some AM stats, look at average number of overlapping ranges,
+	 * and apply that to the currently estimated ranges.
+	 *
+	 * XXX We pretty much combine this with correlation info (because it was
+	 * already applied in the estimatedRanges formula above), which might be
+	 * overly pessimistic. The overlaps stats seems somewhat redundant with
+	 * the correlation, so maybe we should do just one? The AM stats seems
+	 * like a more reliable information, because the correlation is not very
+	 * sensitive to outliers, for example. So maybe let's prefer that, and
+	 * only use the correlation as fallback when AM stats are not available?
+	 */
+	if (averageOverlaps > 0.0)
+		estimatedRanges = Min(estimatedRanges * averageOverlaps, indexRanges);
+
+	elog(DEBUG1, "after index AM stats: cestimatedRanges = %f", estimatedRanges);
+
 	/* we expect to visit this portion of the table */
 	selec = estimatedRanges / indexRanges;
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c07382051d6..e41aabdeae0 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -3138,6 +3138,47 @@ get_attavgwidth(Oid relid, AttrNumber attnum)
 	return 0;
 }
 
+
+/*
+ * get_attstaindexam
+ *
+ *	  Given the table and attribute number of a column, get the index AM
+ *	  statistics.  Return NULL if no data available.
+ *
+ * Currently this is only consulted for individual tables, not for inheritance
+ * trees, so we don't need an "inh" parameter.
+ */
+bytea *
+get_attindexam(Oid relid, AttrNumber attnum)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache3(STATRELATTINH,
+						 ObjectIdGetDatum(relid),
+						 Int16GetDatum(attnum),
+						 BoolGetDatum(false));
+	if (HeapTupleIsValid(tp))
+	{
+		Datum	val;
+		bytea  *retval = NULL;
+		bool	isnull;
+
+		val = SysCacheGetAttr(STATRELATTINH, tp,
+							  Anum_pg_statistic_staindexam,
+							  &isnull);
+
+		if (!isnull)
+			retval = (bytea *) PG_DETOAST_DATUM(val);
+
+		// staindexam = ((Form_pg_statistic) GETSTRUCT(tp))->staindexam;
+		ReleaseSysCache(tp);
+
+		return retval;
+	}
+
+	return NULL;
+}
+
 /*
  * get_attstatsslot
  *
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 43b9d926600..67687d158e6 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1002,6 +1002,16 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexam_stats", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index AM stats."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_indexam_stats,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d06074b86f6..47e80ad150c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -375,6 +375,7 @@
 #enable_hashagg = on
 #enable_hashjoin = on
 #enable_incremental_sort = on
+#enable_indexam_stats = off
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 4f1f67b4d03..e3eab725ae5 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -216,6 +216,8 @@ typedef struct IndexAmRoutine
 	uint16		amsupport;
 	/* opclass options support function number or 0 */
 	uint16		amoptsprocnum;
+	/* opclass statistics support function number or 0 */
+	uint16		amstatsprocnum;
 	/* does AM support ORDER BY indexed column's value? */
 	bool		amcanorder;
 	/* does AM support ORDER BY result of an operator on indexed column? */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index ed66f1b3d51..1d21b816fcd 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,6 +34,69 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinMinmaxStats
+{
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	int64		n_ranges;
+	int64		n_summarized;
+	int64		n_all_nulls;
+	int64		n_has_nulls;
+
+	/* average number of overlapping ranges */
+	double		avg_overlaps;
+
+	/* average number of matching ranges (per value) */
+	double		avg_matches;
+	double		avg_matches_unique;
+
+	/* minval/maxval stats (ndistinct, correlation to blkno) */
+	int64		minval_ndistinct;
+	int64		maxval_ndistinct;
+	double		minval_correlation;
+	double		maxval_correlation;
+
+	/* minval/maxval increment stats */
+	double		minval_increment_avg;
+	double		minval_increment_max;
+	double		maxval_increment_avg;
+	double		maxval_increment_max;
+
+} BrinMinmaxStats;
 
 #define BRIN_DEFAULT_PAGES_PER_RANGE	128
 #define BrinGetPagesPerRange(relation) \
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index 97ddc925b27..eac796e6f47 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -75,6 +75,7 @@ typedef struct BrinDesc
 #define BRIN_PROCNUM_OPTIONS 		5	/* optional */
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
+#define BRIN_PROCNUM_STATISTICS		11	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 5b950129de0..9bbd1f14f12 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -804,6 +804,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -835,6 +837,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -864,6 +868,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -893,6 +899,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -905,6 +913,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -917,6 +927,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1034,6 +1046,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1062,6 +1076,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1110,6 +1126,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1160,6 +1178,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1173,6 +1194,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1261,6 +1285,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1314,6 +1341,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1366,6 +1396,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1436,6 +1468,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1467,6 +1502,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1517,6 +1554,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1530,6 +1570,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1542,6 +1585,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1668,6 +1713,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1721,6 +1769,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1771,6 +1822,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1785,6 +1838,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1799,6 +1855,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1851,6 +1910,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1924,6 +1985,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66b73c3900d..c44784a0d07 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8496,6 +8496,10 @@
 { oid => '3386', descr => 'BRIN minmax support',
   proname => 'brin_minmax_union', prorettype => 'bool',
   proargtypes => 'internal internal internal', prosrc => 'brin_minmax_union' },
+{ oid => '9800', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_stats', prorettype => 'bool',
+  proargtypes => 'internal internal int2 int2 internal int4',
+  prosrc => 'brin_minmax_stats' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 8770c5b4c60..d3d0bce257a 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -121,6 +121,11 @@ CATALOG(pg_statistic,2619,StatisticRelationId)
 	anyarray	stavalues3;
 	anyarray	stavalues4;
 	anyarray	stavalues5;
+
+	/*
+	 * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).
+	 */
+	bytea		staindexam;
 #endif
 } FormData_pg_statistic;
 
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 689dbb77024..dba411cacf7 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -155,6 +155,7 @@ typedef struct VacAttrStats
 	float4	   *stanumbers[STATISTIC_NUM_SLOTS];
 	int			numvalues[STATISTIC_NUM_SLOTS];
 	Datum	   *stavalues[STATISTIC_NUM_SLOTS];
+	Datum		staindexam;		/* index-specific stats (as bytea) */
 
 	/*
 	 * These fields describe the stavalues[n] element types. They will be
@@ -299,6 +300,7 @@ extern PGDLLIMPORT int vacuum_multixact_freeze_min_age;
 extern PGDLLIMPORT int vacuum_multixact_freeze_table_age;
 extern PGDLLIMPORT int vacuum_failsafe_age;
 extern PGDLLIMPORT int vacuum_multixact_failsafe_age;
+extern PGDLLIMPORT bool enable_indexam_stats;
 
 /* Variables for cost-based parallel vacuum */
 extern PGDLLIMPORT pg_atomic_uint32 *VacuumSharedCostBalance;
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b9728..fcef91d306d 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -185,6 +185,7 @@ extern Oid	getBaseType(Oid typid);
 extern Oid	getBaseTypeAndTypmod(Oid typid, int32 *typmod);
 extern int32 get_typavgwidth(Oid typid, int32 typmod);
 extern int32 get_attavgwidth(Oid relid, AttrNumber attnum);
+extern bytea *get_attindexam(Oid relid, AttrNumber attnum);
 extern bool get_attstatsslot(AttStatsSlot *sslot, HeapTuple statstuple,
 							 int reqkind, Oid reqop, int flags);
 extern void free_attstatsslot(AttStatsSlot *sslot);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 001c6e7eb9d..b7fda6fc828 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -117,6 +117,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashagg                 | on
  enable_hashjoin                | on
  enable_incremental_sort        | on
+ enable_indexam_stats           | off
  enable_indexonlyscan           | on
  enable_indexscan               | on
  enable_material                | on
@@ -132,7 +133,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(21 rows)
+(22 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.39.1

0002-wip-introduce-debug_brin_stats-20230218.patchtext/x-patch; charset=UTF-8; name=0002-wip-introduce-debug_brin_stats-20230218.patchDownload

From 081ba6ec9034deff07b0b9f32fc467614aa15e23 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 17:46:51 +0200
Subject: [PATCH 02/10] wip: introduce debug_brin_stats

---
 src/backend/access/brin/brin_minmax.c | 80 +++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c   | 18 ++++++
 2 files changed, 98 insertions(+)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 6b0f303504d..b7106ba6ba9 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -28,6 +28,10 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 
+#ifdef DEBUG_BRIN_STATS
+bool debug_brin_stats = false;
+#endif
+
 typedef struct MinmaxOpaque
 {
 	Oid			cached_subtype;
@@ -493,6 +497,13 @@ brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
 {
 	int64	noverlaps;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	noverlaps = 0;
 	for (int i = 0; i < nranges; i++)
 	{
@@ -515,6 +526,16 @@ brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
 	 */
 	noverlaps *= 2;
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->avg_overlaps = (double) noverlaps / nranges;
 }
 
@@ -540,6 +561,13 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 
 	int64  *unique = (int64 *) palloc0(sizeof(int64) * nvalues);
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	/*
 	 * Build running count of unique values. We know there are unique[i]
 	 * unique values in values array up to index "i".
@@ -588,6 +616,18 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 	Assert(nmatches >= 0);
 	Assert(nmatches_unique >= 0);
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f", nmatches_unique, nvalues_unique,
+			(double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->avg_matches = (double) nmatches / numrows;
 	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
 }
@@ -619,6 +659,13 @@ brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
 			minval_corr = 0,
 			maxval_corr = 0;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	for (int i = 1; i < nranges; i++)
 	{
 		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
@@ -641,6 +688,19 @@ brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
 
 	stats->minval_correlation = (double) minval_corr / nranges;
 	stats->maxval_correlation = (double) maxval_corr / nranges;
+
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_value_stats -----");
+		elog(WARNING, "minval ndistinct " INT64_FORMAT " correlation %f",
+			 stats->minval_ndistinct, stats->minval_correlation);
+		elog(WARNING, "maxval ndistinct " INT64_FORMAT " correlation %f",
+			 stats->maxval_ndistinct, stats->maxval_correlation);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
 }
 
 /*
@@ -665,6 +725,13 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 			max_minval = 0,
 			max_maxval = 0;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	for (int i = 1; i < nranges; i++)
 	{
 		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
@@ -716,6 +783,19 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 		}
 	}
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_increment_stats -----");
+		elog(WARNING, "minval ndistinct %ld sum %f max %f avg %f",
+			 minval_ndist, sum_minval, max_minval, sum_minval / minval_ndist);
+		elog(WARNING, "maxval ndistinct %ld sum %f max %f avg %f",
+			 maxval_ndist, sum_maxval, max_maxval, sum_maxval / maxval_ndist);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->minval_increment_avg = (sum_minval / minval_ndist);
 	stats->minval_increment_max = max_minval;
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 67687d158e6..f8d06296fb1 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -96,6 +96,10 @@ extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
 extern bool synchronize_seqscans;
 
+#ifdef DEBUG_BRIN_STATS
+extern bool debug_brin_stats;
+#endif
+
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
 #endif
@@ -1230,6 +1234,20 @@ struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+#ifdef DEBUG_BRIN_STATS
+	/* this is undocumented because not exposed in a standard build */
+	{
+		{"debug_brin_stats", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Print info about calculated BRIN statistics."),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&debug_brin_stats,
+		false,
+		NULL, NULL, NULL
+	},
+#endif
+
 	{
 		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Terminate session on any error."),
-- 
2.39.1

0003-wip-introduce-debug_brin_cross_check-20230218.patchtext/x-patch; charset=UTF-8; name=0003-wip-introduce-debug_brin_cross_check-20230218.patchDownload

From b3d882eee16a0ab29fd54c5e47aec442c7a0f02d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 20:47:31 +0200
Subject: [PATCH 03/10] wip: introduce debug_brin_cross_check

---
 src/backend/access/brin/brin_minmax.c | 574 ++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c   |  10 +
 2 files changed, 584 insertions(+)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index b7106ba6ba9..ed138f1f89d 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -30,6 +30,7 @@
 
 #ifdef DEBUG_BRIN_STATS
 bool debug_brin_stats = false;
+bool debug_brin_cross_check = false;
 #endif
 
 typedef struct MinmaxOpaque
@@ -340,6 +341,50 @@ range_values_cmp(const void *a, const void *b, void *arg)
 	return DatumGetInt32(c);
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * maxval_start
+ *		Determine first index so that (maxvalue >= value).
+ *
+ * The array of ranges is expected to be sorted by maxvalue, so this is the first
+ * range that can possibly intersect with range having "value" as minval.
+ */
+static int
+maxval_start(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[start]->max_value, typcache) <= 0)
+		return 0;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[end]->max_value, typcache) > 0)
+		return nranges;
+
+	while ((end - start) > 0)
+	{
+		int	 midpoint;
+		int	 r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->max_value, typcache);
+
+		if (r <= 0)
+			end = midpoint;
+		else
+			start = (midpoint + 1);
+	}
+
+	Assert(ranges[start]->max_value >= value);
+	Assert(ranges[start-1]->max_value < value);
+
+	return start;
+}
+#endif
+
 /*
  * minval_end
  *		Determine first index so that (minval > value).
@@ -632,6 +677,316 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * Simple histogram, with bins tracking value and two overlap counts.
+ *
+ * XXX Maybe we should have two separate histograms, one for all counts and
+ * another one for "unique" values.
+ *
+ * XXX Serialize the histogram. There might be a data set where we have very
+ * many distinct buckets (values having very different number of matching
+ * ranges) - not sure if there's some sort of upper limit (but hard to say for
+ * other opclasses, like bloom). And we don't want arbitrarily large histogram,
+ * to keep the statistics fairly small, I guess. So we'd need to pick a subset,
+ * merge buckets with "similar" counts, or approximate it somehow. For now we
+ * don't serialize it, because we don't use the histogram.
+ */
+typedef struct histogram_bin_t
+{
+	int64	value;
+	int64	count;
+} histogram_bin_t;
+
+typedef struct histogram_t
+{
+	int				nbins;
+	int				nbins_max;
+	histogram_bin_t	bins[FLEXIBLE_ARRAY_MEMBER];
+} histogram_t;
+
+#define HISTOGRAM_BINS_START 32
+
+/* allocate histogram with default number of bins */
+static histogram_t *
+histogram_init(void)
+{
+	histogram_t *hist;
+
+	hist = (histogram_t *) palloc0(offsetof(histogram_t, bins) +
+								   sizeof(histogram_bin_t) * HISTOGRAM_BINS_START);
+	hist->nbins_max = HISTOGRAM_BINS_START;
+
+	return hist;
+}
+
+/*
+ * histogram_add
+ *			 Add a hit for a particular value to the histogram.
+ *
+ * XXX We don't sort the bins, so just do binary sort. For large number of values
+ * this might be an issue, for small number of values a linear search is fine.
+ */
+static histogram_t *
+histogram_add(histogram_t *hist, int value)
+{
+	bool	found = false;
+	histogram_bin_t *bin;
+
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		if (hist->bins[i].value == value)
+		{
+			bin = &hist->bins[i];
+			found = true;
+		}
+	}
+
+	if (!found)
+	{
+		if (hist->nbins == hist->nbins_max)
+		{
+			int		nbins = (2 * hist->nbins_max);
+
+			hist = repalloc(hist, offsetof(histogram_t, bins) +
+							sizeof(histogram_bin_t) * nbins);
+			hist->nbins_max = nbins;
+		}
+
+		Assert(hist->nbins < hist->nbins_max);
+
+		bin = &hist->bins[hist->nbins++];
+		bin->value = value;
+		bin->count = 0;
+	}
+
+	bin->count += 1;
+
+	Assert(bin->value == value);
+	Assert(bin->count >= 0);
+
+	return hist;
+}
+
+/* used to sort histogram bins by value */
+static int
+histogram_bin_cmp(const void *a, const void *b)
+{
+	histogram_bin_t *ba = (histogram_bin_t *) a;
+	histogram_bin_t *bb = (histogram_bin_t *) b;
+
+	if (ba->value < bb->value)
+		return -1;
+
+	if (bb->value < ba->value)
+		return 1;
+
+	return 0;
+}
+
+static void
+histogram_print(histogram_t *hist)
+{
+	return;
+
+	elog(WARNING, "----- histogram -----");
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		elog(WARNING, "bin %d value %ld count %ld",
+				i, hist->bins[i].value, hist->bins[i].count);
+	}
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges2
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Match sample tuples to the ranges, so that we can count how many ranges
+ * a value matches on average. This might seem redundant to the number of
+ * overlaps, because the value is ~avg_overlaps/2.
+ *
+ * Imagine ranges arranged in "shifted" uniformly by 1/overlaps, e.g. with 3
+ * overlaps [0,100], [33,133], [66, 166] and so on. A random value will hit
+ * only half of there ranges, thus 1/2. This can be extended to randomly
+ * overlapping ranges.
+ *
+ * However, we may not be able to count overlaps for some opclasses (e.g. for
+ * bloom ranges), in which case we have at least this.
+ *
+ * This simply walks the values, and determines matching ranges by looking
+ * for lower/upper bound in ranges ordered by minval/maxval.
+ *
+ * XXX The other question is what to do about duplicate values. If we have a
+ * very frequent value in the sample, it's likely in many places/ranges. Which
+ * will skew the average, because it'll be added repeatedly. So we also count
+ * avg_ranges for unique values.
+ *
+ * XXX The relationship that (average_matches ~ average_overlaps/2) only
+ * works for minmax opclass, and can't be extended to minmax-multi. The
+ * overlaps can only consider the two extreme values (essentially treating
+ * the summary as a single minmax range), because that's what brinsort
+ * needs. But the minmax-multi range may have "gaps" (kinda the whole point
+ * of these opclasses), which affects matching tuples to ranges.
+ *
+ * XXX This also builds histograms of the number of matches, both for the
+ * raw and unique values. At the moment we don't do anything with the
+ * results, though (except for printing those).
+ */
+static void
+brin_minmax_match_tuples_to_ranges2(BrinRanges *ranges,
+									BrinRange **minranges, BrinRange **maxranges,
+									int numrows, HeapTuple *rows,
+									int nvalues, Datum *values,
+									TypeCacheEntry *typcache,
+									BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nmatches_value = 0;
+	int64	nvalues_unique = 0;
+
+	histogram_t	   *hist = histogram_init();
+	histogram_t	   *hist_unique = histogram_init();
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		int		start;
+		int		end;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Same value as preceding, so just use the preceding count.
+		 * We don't increment the unique counters, because this is
+		 * a duplicate.
+		 */
+		if ((i > 0) && (range_values_cmp(&values[i-1], &values[i], typcache) == 0))
+		{
+			nmatches += nmatches_value;
+			hist = histogram_add(hist, nmatches_value);
+			continue;
+		}
+
+		nmatches_value = 0;
+
+		start = maxval_start(maxranges, ranges->nranges, values[i], typcache);
+		end = minval_end(minranges, ranges->nranges, values[i], typcache);
+
+		for (int j = start; j < ranges->nranges; j++)
+		{
+			if (maxranges[j]->min_index >= end)
+				continue;
+
+			if (maxranges[j]->min_index_lowest >= end)
+				break;
+
+			nmatches_value++;
+		}
+
+		hist = histogram_add(hist, nmatches_value);
+		hist_unique = histogram_add(hist_unique, nmatches_value);
+
+		nmatches += nmatches_value;
+		nmatches_unique += nmatches_value;
+		nvalues_unique++;
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges2 -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f",
+			 nmatches_unique, nvalues_unique, (double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_matches != (double) nmatches / numrows)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges2: avg_matches mismatch %f != %f",
+			 stats->avg_matches, (double) nmatches / numrows);
+
+	if (stats->avg_matches_unique != (double) nmatches_unique / nvalues_unique)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges2: avg_matches_unique mismatch %f != %f",
+			 stats->avg_matches_unique, (double) nmatches_unique / nvalues_unique);
+
+	pg_qsort(hist->bins, hist->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+	pg_qsort(hist_unique->bins, hist_unique->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+
+	histogram_print(hist);
+	histogram_print(hist_unique);
+
+	pfree(hist);
+	pfree(hist_unique);
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges_bruteforce
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Bruteforce approach, used mostly for cross-checking.
+ */
+static void
+brin_minmax_match_tuples_to_ranges_bruteforce(BrinRanges *ranges,
+											  int numrows, HeapTuple *rows,
+											  int nvalues, Datum *values,
+											  TypeCacheEntry *typcache,
+											  BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nvalues_unique = 0;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		bool	is_unique;
+		int64	nmatches_value = 0;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* is this a new value? */
+		is_unique = ((i == 0) || (range_values_cmp(&values[i-1], &values[i], typcache) != 0));
+
+		/* count unique values */
+		nvalues_unique += (is_unique) ? 1 : 0;
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			if (range_values_cmp(&values[i], &ranges->ranges[j].min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&values[i], &ranges->ranges[j].max_value, typcache) > 0)
+				continue;
+
+			nmatches_value++;
+		}
+
+		nmatches += nmatches_value;
+		nmatches_unique += (is_unique) ? nmatches_value : 0;
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges_bruteforce -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f", nmatches_unique, nvalues_unique,
+			 (double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_matches != (double) nmatches / numrows)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges_bruteforce: avg_matches mismatch %f != %f",
+			 stats->avg_matches, (double) nmatches / numrows);
+
+	if (stats->avg_matches_unique != (double) nmatches_unique / nvalues_unique)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges_bruteforce: avg_matches_unique mismatch %f != %f",
+			 stats->avg_matches_unique, (double) nmatches_unique / nvalues_unique);
+}
+#endif
+
 /*
  * brin_minmax_value_stats
  *		Calculate statistics about minval/maxval values.
@@ -803,6 +1158,198 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 	stats->maxval_increment_max = max_maxval;
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * brin_minmax_count_overlaps2
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges/maxranges to quickly eliminate ranges that can't
+ * possibly intersect.
+ *
+ * XXX Seems rather complicated and works poorly for wide ranges (with outlier
+ * values), brin_minmax_count_overlaps is likely better.
+ */
+static void
+brin_minmax_count_overlaps2(BrinRanges *ranges,
+						   BrinRange **minranges, BrinRange **maxranges,
+						   TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	int64			noverlaps;
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Walk the ranges ordered by max_values, see how many ranges overlap.
+	 *
+	 * Once we get to a state where (min_value > current.max_value) for
+	 * all future ranges, we know none of them can overlap and we can
+	 * terminate. This is what min_index_lowest is for.
+	 *
+	 * XXX If there are very wide ranges (with outlier min/max values),
+	 * the min_index_lowest is going to be pretty useless, because the
+	 * range will be sorted at the very end by max_value, but will have
+	 * very low min_index, so this won't work.
+	 *
+	 * XXX We could collect a more elaborate stuff, like for example a
+	 * histogram of number of overlaps, or maximum number of overlaps.
+	 * So we'd have average, but then also an info if there are some
+	 * ranges with very many overlaps.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int			idx = (i + 1);
+		BrinRange *ra = maxranges[i];
+		uint64		min_index = ra->min_index;
+
+		CHECK_FOR_INTERRUPTS();
+
+#ifdef NOT_USED
+		/*
+		 * XXX Not needed, we can just count "future" ranges and then
+		 * we just multiply by 2.
+		 */
+
+		/*
+		 * What's the first range that might overlap with this one?
+		 * needs to have maxval > current.minval.
+		 */
+		while (idx > 0)
+		{
+			BrinRange *rb = maxranges[idx - 1];
+
+			/* the range is before the current one, so can't intersect */
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				break;
+
+			idx--;
+		}
+#endif
+
+		/*
+		 * Find the first min_index that is higher than the max_value,
+		 * so that we can compare that instead of the values in the
+		 * next loop. There should be fewer value comparisons than in
+		 * the next loop, so we'll save on function calls.
+		 */
+		while (min_index < ranges->nranges)
+		{
+			if (range_values_cmp(&minranges[min_index]->min_value,
+								 &ra->max_value, typcache) > 0)
+				break;
+
+			min_index++;
+		}
+
+		/*
+		 * Walk the following ranges (ordered by max_value), and check
+		 * if it overlaps. If it matches, we look at the next one. If
+		 * not, we check if there can be more ranges.
+		 */
+		for (int j = idx; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = maxranges[j];
+
+			/* the range overlaps - just continue with the next one */
+			// if (range_values_cmp(&rb->min_value, &ra->max_value, typcache) <= 0)
+			if (rb->min_index < min_index)
+			{
+				noverlaps++;
+				continue;
+			}
+
+			/*
+			 * Are there any future ranges that might overlap? We can
+			 * check the min_index_lowest to decide quickly.
+			 */
+			 if (rb->min_index_lowest >= min_index)
+					break;
+		}
+	}
+
+	/*
+	 * We only count intersect for "following" ranges when ordered by maxval,
+	 * so we only see 1/2 the overlaps. So double the result.
+	 */
+	noverlaps *= 2;
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps2 -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_overlaps != (double) noverlaps / ranges->nranges)
+		elog(ERROR, "brin_minmax_count_overlaps2: mismatch %f != %f",
+			 stats->avg_overlaps, (double) noverlaps / ranges->nranges);
+}
+
+/*
+ * brin_minmax_count_overlaps_bruteforce
+ *		Calculate number of overlaps by brute force.
+ *
+ * Actually compares every range to every other range. Quite expensive, used
+ * primarily to cross-check the other algorithms.
+ */
+static void
+brin_minmax_count_overlaps_bruteforce(BrinRanges *ranges,
+									  TypeCacheEntry *typcache,
+									  BrinMinmaxStats *stats)
+{
+	int64			noverlaps;
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Brute force calculation of overlapping ranges, comparing each
+	 * range to every other range - bound to be pretty expensive, as
+	 * it's pretty much O(N^2). Kept mostly for easy cross-check with
+	 * the preceding "optimized" code.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		BrinRange *ra = &ranges->ranges[i];
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = &ranges->ranges[j];
+
+			CHECK_FOR_INTERRUPTS();
+
+			if (i == j)
+				continue;
+
+			if (range_values_cmp(&ra->max_value, &rb->min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				continue;
+
+#if 0
+			elog(DEBUG1, "[%ld,%ld] overlaps [%ld,%ld]",
+				 ra->min_value, ra->max_value,
+				 rb->min_value, rb->max_value);
+#endif
+
+			noverlaps++;
+		}
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps_bruteforce -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_overlaps != (double) noverlaps / ranges->nranges)
+		elog(ERROR, "brin_minmax_count_overlaps2: mismatch %f != %f",
+			 stats->avg_overlaps, (double) noverlaps / ranges->nranges);
+}
+#endif
+
 /*
  * brin_minmax_stats
  *		Calculate custom statistics for a BRIN minmax index.
@@ -814,6 +1361,11 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
  *  - average number of rows matching a range
  *  - number of distinct minval/maxval values
  *
+ * There are multiple ways to calculate some of the metrics, so to allow
+ * cross-checking during development it's possible to run and compare all.
+ * To do that, define STATS_CROSS_CHECK. There's also STATS_DEBUG define
+ * that simply prints the calculated results.
+ *
  * XXX This could also calculate correlation of the range minval, so that
  * we can estimate how much random I/O will happen during the BrinSort.
  * And perhaps we should also sort the ranges by (minval,block_start) to
@@ -1135,6 +1687,14 @@ brin_minmax_stats(PG_FUNCTION_ARGS)
 	/* calculate average number of overlapping ranges for any range */
 	brin_minmax_count_overlaps(minranges, ranges->nranges, typcache, stats);
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_cross_check)
+	{
+		brin_minmax_count_overlaps2(ranges, minranges, maxranges, typcache, stats);
+		brin_minmax_count_overlaps_bruteforce(ranges, typcache, stats);
+	}
+#endif
+
 	/* calculate minval/maxval stats (distinct values and correlation) */
 	brin_minmax_value_stats(minranges, maxranges,
 							ranges->nranges, typcache, stats);
@@ -1200,6 +1760,20 @@ brin_minmax_stats(PG_FUNCTION_ARGS)
 										   numrows, rows, nvalues, values,
 										   typcache, stats);
 
+#ifdef DEBUG_BRIN_STATS
+		if (debug_brin_cross_check)
+		{
+			brin_minmax_match_tuples_to_ranges2(ranges, minranges, maxranges,
+												numrows, rows, nvalues, values,
+												typcache, stats);
+
+			brin_minmax_match_tuples_to_ranges_bruteforce(ranges,
+														  numrows, rows,
+														  nvalues, values,
+														  typcache, stats);
+		}
+#endif
+
 		brin_minmax_increment_stats(minranges, maxranges, ranges->nranges,
 									values, nvalues, typcache, stats);
 	}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f8d06296fb1..1d576343ecd 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -98,6 +98,7 @@ extern bool synchronize_seqscans;
 
 #ifdef DEBUG_BRIN_STATS
 extern bool debug_brin_stats;
+extern bool debug_brin_cross_check;
 #endif
 
 #ifdef TRACE_SYNCSCAN
@@ -1246,6 +1247,15 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"debug_brin_cross_check", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Cross-check calculation of BRIN statistics."),
+			NULL
+		},
+		&debug_brin_cross_check,
+		false,
+		NULL, NULL, NULL
+	},
 #endif
 
 	{
-- 
2.39.1

0004-Allow-BRIN-indexes-to-produce-sorted-output-20230218.patchtext/x-patch; charset=UTF-8; name=0004-Allow-BRIN-indexes-to-produce-sorted-output-20230218.patchDownload

From 2a920a3ed0eba78f0a4f7fb903eeaf3432ed725a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 9 Oct 2022 11:33:37 +0200
Subject: [PATCH 04/10] Allow BRIN indexes to produce sorted output

Some BRIN indexes can be used to produce sorted output, by using the
range information to sort tuples incrementally. This is particularly
interesting for LIMIT queries, which only need to scan the first few
rows, and alternative plans (e.g. Seq Scan + Sort) have a very high
startup cost.

Of course, if there are e.g. BTREE indexes this is going to be slower,
but people are unlikely to have both index types on the same column.

This is disabled by default, use enable_brinsort GUC to enable it.
---
 src/backend/access/brin/brin_minmax.c         |  402 ++++
 src/backend/commands/explain.c                |   44 +
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/meson.build              |    1 +
 src/backend/executor/nodeBrinSort.c           | 1612 +++++++++++++++++
 src/backend/optimizer/path/costsize.c         |  254 +++
 src/backend/optimizer/path/indxpath.c         |  183 ++
 src/backend/optimizer/path/pathkeys.c         |   49 +
 src/backend/optimizer/plan/createplan.c       |  260 +++
 src/backend/optimizer/plan/setrefs.c          |   19 +
 src/backend/optimizer/util/pathnode.c         |   57 +
 src/backend/utils/misc/guc_tables.c           |   28 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/backend/utils/sort/tuplesort.c            |   12 +
 src/include/access/brin.h                     |   35 -
 src/include/access/brin_internal.h            |    1 +
 src/include/catalog/pg_amproc.dat             |   64 +
 src/include/catalog/pg_proc.dat               |    5 +-
 src/include/executor/nodeBrinSort.h           |   47 +
 src/include/nodes/execnodes.h                 |  108 ++
 src/include/nodes/pathnodes.h                 |   11 +
 src/include/nodes/plannodes.h                 |   32 +
 src/include/optimizer/cost.h                  |    3 +
 src/include/optimizer/pathnode.h              |    9 +
 src/include/optimizer/paths.h                 |    3 +
 src/include/utils/tuplesort.h                 |    1 +
 src/test/regress/expected/sysviews.out        |    3 +-
 28 files changed, 3218 insertions(+), 37 deletions(-)
 create mode 100644 src/backend/executor/nodeBrinSort.c
 create mode 100644 src/include/executor/nodeBrinSort.h

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index ed138f1f89d..1181c9e7388 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -16,6 +16,10 @@
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
 #include "executor/executor.h"
@@ -43,6 +47,9 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
 
+/* print info about ranges */
+#define BRINSORT_DEBUG
+
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1803,6 +1810,401 @@ cleanup:
 	PG_RETURN_POINTER(stats);
 }
 
+/*
+ * brin_minmax_range_tupdesc
+ *		Create a tuple descriptor to store BrinRange data.
+ */
+static TupleDesc
+brin_minmax_range_tupdesc(BrinDesc *brdesc, AttrNumber attnum)
+{
+	TupleDesc	tupdesc;
+	AttrNumber	attno = 1;
+
+	/* expect minimum and maximum */
+	Assert(brdesc->bd_info[attnum - 1]->oi_nstored == 2);
+
+	tupdesc = CreateTemplateTupleDesc(7);
+
+	/* blkno_start */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT4OID, -1, 0);
+
+	/* blkno_end (could be calculated as blkno_start + pages_per_range) */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT4OID, -1, 0);
+
+	/* has_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* all_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* not_summarized */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* min_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	/* max_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	return tupdesc;
+}
+
+/*
+ * brin_minmax_scan_init
+ *		Prepare the BrinRangeScanDesc including the sorting info etc.
+ *
+ * We want to have the ranges in roughly this order
+ *
+ * - not-summarized
+ * - summarized, non-null values
+ * - summarized, all-nulls
+ *
+ * We do it this way, because the not-summarized ranges need to be
+ * scanned always (both to produce NULL and non-NULL values), and
+ * we need to read all of them into the tuplesort before producing
+ * anything. So placing them at the beginning is reasonable.
+ *
+ * The all-nulls ranges are placed last, because when processing
+ * NULLs we need to scan everything anyway (some of the ranges might
+ * have has_nulls=true). But for non-NULL values we can abort once
+ * we hit the first all-nulls range.
+ *
+ * The regular ranges are sorted by blkno_start, to make it maybe
+ * a bit more sequential (but this only helps if there are ranges
+ * with the same minval).
+ */
+static BrinRangeScanDesc *
+brin_minmax_scan_init(BrinDesc *bdesc, Oid collation, AttrNumber attnum, bool asc)
+{
+	BrinRangeScanDesc  *scan;
+
+	/* sort by (not_summarized, minval, blkno_start, all_nulls) */
+	AttrNumber			keys[4];
+	Oid					collations[4];
+	bool				nullsFirst[4];
+	Oid					operators[4];
+	Oid					typid;
+	TypeCacheEntry	   *typcache;
+
+	/* we expect to have min/max value for each range, same type for both */
+	Assert(bdesc->bd_info[attnum - 1]->oi_nstored == 2);
+	Assert(bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id ==
+		   bdesc->bd_info[attnum - 1]->oi_typcache[1]->type_id);
+
+	scan = (BrinRangeScanDesc *) palloc0(sizeof(BrinRangeScanDesc));
+
+	/* build tuple descriptor for range data */
+	scan->tdesc = brin_minmax_range_tupdesc(bdesc, attnum);
+
+	/* initialize ordering info */
+	keys[0] = 5;				/* not_summarized */
+	keys[1] = 4;				/* all_nulls */
+	keys[2] = (asc) ? 6 : 7;	/* min_value (asc) or max_value (desc) */
+	keys[3] = 1;				/* blkno_start */
+
+	collations[0] = InvalidOid;	/* FIXME */
+	collations[1] = InvalidOid;	/* FIXME */
+	collations[2] = collation;	/* FIXME */
+	collations[3] = InvalidOid;	/* FIXME */
+
+	/* unrelated to the ordering desired by the user */
+	nullsFirst[0] = false;
+	nullsFirst[1] = false;
+	nullsFirst[2] = false;
+	nullsFirst[3] = false;
+
+	/* lookup sort operator for the boolean type (used for not_summarized) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_GT_OPR);
+	operators[0] = typcache->gt_opr;
+
+	/* lookup sort operator for the boolean type (used for all_nulls) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_LT_OPR);
+	operators[1] = typcache->lt_opr;
+
+	/* lookup sort operator for the min/max type */
+	typid = bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id;
+	typcache = lookup_type_cache(typid, TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+	operators[2] = (asc) ? typcache->lt_opr : typcache->gt_opr;
+
+	/* lookup sort operator for the bigint type (used for blkno_start) */
+	typcache = lookup_type_cache(INT4OID, TYPECACHE_LT_OPR);
+	operators[3] = typcache->lt_opr;
+
+	/*
+	 * XXX better to keep this small enough to fit into L2/L3, large values
+	 * of work_mem may easily make this slower.
+	 */
+	scan->ranges = tuplesort_begin_heap(scan->tdesc,
+										4, /* nkeys */
+										keys,
+										operators,
+										collations,
+										nullsFirst,
+										work_mem,
+										NULL,
+										TUPLESORT_RANDOMACCESS);
+
+	scan->slot = MakeSingleTupleTableSlot(scan->tdesc,
+										  &TTSOpsMinimalTuple);
+
+	return scan;
+}
+
+/*
+ * brin_minmax_scan_add_tuple
+ *		Form and store a tuple representing the BRIN range to the tuplestore.
+ */
+static void
+brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan, TupleTableSlot *slot,
+						   BlockNumber block_start, BlockNumber block_end,
+						   bool has_nulls, bool all_nulls, bool not_summarized,
+						   Datum min_value, Datum max_value)
+{
+	ExecClearTuple(slot);
+
+	memset(slot->tts_isnull, false, 7 * sizeof(bool));
+
+	slot->tts_values[0] = UInt32GetDatum(block_start);
+	slot->tts_values[1] = UInt32GetDatum(block_end);
+	slot->tts_values[2] = BoolGetDatum(has_nulls);
+	slot->tts_values[3] = BoolGetDatum(all_nulls);
+	slot->tts_values[4] = BoolGetDatum(not_summarized);
+	slot->tts_values[5] = min_value;
+	slot->tts_values[6] = max_value;
+
+	if (all_nulls || not_summarized)
+	{
+		slot->tts_isnull[5] = true;
+		slot->tts_isnull[6] = true;
+	}
+
+	ExecStoreVirtualTuple(slot);
+
+	tuplesort_puttupleslot(scan->ranges, slot);
+
+	scan->nranges++;
+}
+
+#ifdef BRINSORT_DEBUG
+/*
+ * brin_minmax_scan_next
+ *		Return the next BRIN range information from the tuplestore.
+ *
+ * Returns NULL when there are no more ranges.
+ */
+static BrinRange *
+brin_minmax_scan_next(BrinRangeScanDesc *scan)
+{
+	if (tuplesort_gettupleslot(scan->ranges, true, false, scan->slot, NULL))
+	{
+		bool		isnull;
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = slot_getattr(scan->slot, 1, &isnull);
+		range->blkno_end = slot_getattr(scan->slot, 2, &isnull);
+		range->has_nulls = slot_getattr(scan->slot, 3, &isnull);
+		range->all_nulls = slot_getattr(scan->slot, 4, &isnull);
+		range->not_summarized = slot_getattr(scan->slot, 5, &isnull);
+		range->min_value = slot_getattr(scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(scan->slot, 7, &isnull);
+
+		return range;
+	}
+
+	return NULL;
+}
+
+/*
+ * brin_minmax_scan_dump
+ *		Print info about all page ranges stored in the tuplestore.
+ */
+static void
+brin_minmax_scan_dump(BrinRangeScanDesc *scan)
+{
+	BrinRange *range;
+
+	if (!message_level_is_interesting(WARNING))
+		return;
+
+	elog(WARNING, "===== dumping =====");
+	while ((range = brin_minmax_scan_next(scan)) != NULL)
+	{
+		elog(WARNING, "[%u %u] has_nulls %d all_nulls %d not_summarized %d values [%ld %ld]",
+			 range->blkno_start, range->blkno_end,
+			 range->has_nulls, range->all_nulls, range->not_summarized,
+			 range->min_value, range->max_value);
+
+		pfree(range);
+	}
+
+	/* reset the tuplestore, so that we can start scanning again */
+	tuplesort_rescan(scan->ranges);
+}
+#endif
+
+static void
+brin_minmax_scan_finalize(BrinRangeScanDesc *scan)
+{
+	tuplesort_performsort(scan->ranges);
+}
+
+/*
+ * brin_minmax_ranges
+ *		Load the BRIN ranges and sort them.
+ */
+Datum
+brin_minmax_ranges(PG_FUNCTION_ARGS)
+{
+	IndexScanDesc	scan = (IndexScanDesc) PG_GETARG_POINTER(0);
+	AttrNumber		attnum = PG_GETARG_INT16(1);
+	bool			asc = PG_GETARG_BOOL(2);
+	Oid				colloid = PG_GET_COLLATION();
+	BrinOpaque *opaque;
+	Relation	indexRel;
+	Relation	heapRel;
+	BlockNumber nblocks;
+	BlockNumber	heapBlk;
+	Oid			heapOid;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinRangeScanDesc *brscan;
+	TupleTableSlot *slot;
+
+	/*
+	 * Determine how many BRIN ranges could there be, allocate space and read
+	 * all the min/max values.
+	 */
+	opaque = (BrinOpaque *) scan->opaque;
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	indexRel = bdesc->bd_index;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	heapOid = IndexGetRelation(RelationGetRelid(indexRel), false);
+	heapRel = table_open(heapOid, AccessShareLock);
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+	table_close(heapRel, AccessShareLock);
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* initialize the scan describing scan of ranges sorted by minval */
+	brscan = brin_minmax_scan_init(bdesc, colloid, attnum, asc);
+
+	slot = MakeSingleTupleTableSlot(brscan->tdesc, &TTSOpsVirtual);
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX The sort may be quite expensive, e.g. for small BRIN ranges. Maybe
+	 * we could optimize this somehow? For example, we know the not-summarized
+	 * ranges are always going to be first, and all-null ranges last, so maybe
+	 * we could stash those somewhere, and not sort them? But there are likely
+	 * only very few such ranges, in most cases. Moreover, how would we then
+	 * prepend/append those ranges to the sorted ones? Probably would have to
+	 * store them in a tuplestore, or something.
+	 *
+	 * XXX Seems that having large work_mem can be quite detrimental, because
+	 * then it overflows L2/L3 caches, making the sort much slower.
+	 *
+	 * XXX If there are other indexes, would be great to filter the ranges, so
+	 * that we only sort the interesting ones - reduces the number of ranges,
+	 * makes the sort faster.
+	 *
+	 * XXX Another option is making this incremental - e.g. only ask for the
+	 * first 1000 ranges, using a top-N sort. And then if it's not enough we
+	 * could request another chunk. But the second request would have to be
+	 * rather unlikely (because quite expensive), and the top-N sort does not
+	 * seem all that faster (as long as we don't overflow L2/L3).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   scan->xs_snapshot);
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/*
+		 * Ranges with no indexed tuple may contain anything.
+		 */
+		if (!gottuple)
+		{
+			brin_minmax_scan_add_tuple(brscan, slot,
+									   heapBlk, heapBlk + (pagesPerRange - 1),
+									   false, false, true, 0, 0);
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/*
+				 * Placeholder tuples are treated as if not summarized.
+				 *
+				 * XXX Is this correct?
+				 */
+				brin_minmax_scan_add_tuple(brscan, slot,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   false, false, true, 0, 0);
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				brin_minmax_scan_add_tuple(brscan, slot,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   bval->bv_hasnulls, bval->bv_allnulls, false,
+										   bval->bv_values[0], bval->bv_values[1]);
+			}
+		}
+	}
+
+	ExecDropSingleTupleTableSlot(slot);
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* do the sort and any necessary post-processing */
+	brin_minmax_scan_finalize(brscan);
+
+#ifdef BRINSORT_DEBUG
+	brin_minmax_scan_dump(brscan);
+#endif
+
+	PG_RETURN_POINTER(brscan);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e57bda7b62d..153e41b856f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -85,6 +85,8 @@ static void show_sort_keys(SortState *sortstate, List *ancestors,
 						   ExplainState *es);
 static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
+static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
+							   ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1100,6 +1102,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexScan:
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
+		case T_BrinSort:
 		case T_TidScan:
 		case T_TidRangeScan:
 		case T_SubqueryScan:
@@ -1262,6 +1265,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_IndexOnlyScan:
 			pname = sname = "Index Only Scan";
 			break;
+		case T_BrinSort:
+			pname = sname = "BRIN Sort";
+			break;
 		case T_BitmapIndexScan:
 			pname = sname = "Bitmap Index Scan";
 			break;
@@ -1508,6 +1514,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainScanTarget((Scan *) indexonlyscan, es);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *brinsort = (BrinSort *) plan;
+
+				ExplainIndexScanDetails(brinsort->indexid,
+										brinsort->indexorderdir,
+										es);
+				ExplainScanTarget((Scan *) brinsort, es);
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *bitmapindexscan = (BitmapIndexScan *) plan;
@@ -1790,6 +1806,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainPropertyFloat("Heap Fetches", NULL,
 									 planstate->instrument->ntuples2, 0, es);
 			break;
+		case T_BrinSort:
+			show_scan_qual(((BrinSort *) plan)->indexqualorig,
+						   "Index Cond", planstate, ancestors, es);
+			if (((BrinSort *) plan)->indexqualorig)
+				show_instrumentation_count("Rows Removed by Index Recheck", 2,
+										   planstate, es);
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			break;
 		case T_BitmapIndexScan:
 			show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
@@ -2389,6 +2417,21 @@ show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 						 ancestors, es);
 }
 
+/*
+ * Show the sort keys for a BRIN Sort node.
+ */
+static void
+show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSort	   *plan = (BrinSort *) sortstate->ss.ps.plan;
+
+	show_sort_group_keys((PlanState *) sortstate, "Sort Key",
+						 plan->numCols, 0, plan->sortColIdx,
+						 plan->sortOperators, plan->collations,
+						 plan->nullsFirst,
+						 ancestors, es);
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
@@ -3809,6 +3852,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
+		case T_BrinSort:
 			/* Assert it's on a real relation */
 			Assert(rte->rtekind == RTE_RELATION);
 			objectname = get_rel_name(rte->relid);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 11118d0ce02..bcaa2ce8e21 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -38,6 +38,7 @@ OBJS = \
 	nodeBitmapHeapscan.o \
 	nodeBitmapIndexscan.o \
 	nodeBitmapOr.o \
+	nodeBrinSort.o \
 	nodeCtescan.o \
 	nodeCustom.o \
 	nodeForeignscan.o \
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d41..93d10078091 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -79,6 +79,7 @@
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
+#include "executor/nodeBrinSort.h"
 #include "executor/nodeCtescan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -226,6 +227,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														 estate, eflags);
 			break;
 
+		case T_BrinSort:
+			result = (PlanState *) ExecInitBrinSort((BrinSort *) node,
+													estate, eflags);
+			break;
+
 		case T_BitmapIndexScan:
 			result = (PlanState *) ExecInitBitmapIndexScan((BitmapIndexScan *) node,
 														   estate, eflags);
@@ -639,6 +645,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIndexOnlyScan((IndexOnlyScanState *) node);
 			break;
 
+		case T_BrinSortState:
+			ExecEndBrinSort((BrinSortState *) node);
+			break;
+
 		case T_BitmapIndexScanState:
 			ExecEndBitmapIndexScan((BitmapIndexScanState *) node);
 			break;
diff --git a/src/backend/executor/meson.build b/src/backend/executor/meson.build
index 65f9457c9b1..ed7f38a1397 100644
--- a/src/backend/executor/meson.build
+++ b/src/backend/executor/meson.build
@@ -26,6 +26,7 @@ backend_sources += files(
   'nodeBitmapHeapscan.c',
   'nodeBitmapIndexscan.c',
   'nodeBitmapOr.c',
+  'nodeBrinSort.c',
   'nodeCtescan.c',
   'nodeCustom.c',
   'nodeForeignscan.c',
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
new file mode 100644
index 00000000000..9505eafc548
--- /dev/null
+++ b/src/backend/executor/nodeBrinSort.c
@@ -0,0 +1,1612 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.c
+ *	  Routines to support sorted scan of relations using a BRIN index
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * The overall algorithm is roughly this:
+ *
+ * 0) initialize a tuplestore and a tuplesort
+ *
+ * 1) fetch list of page ranges from the BRIN index, sorted by minval
+ *    (with the not-summarized ranges first, and all-null ranges last)
+ *
+ * 2) for NULLS FIRST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ * 3) while there are ranges in the list, do this:
+ *
+ *   a) get next (distinct) minval from the list, call it watermark
+ *
+ *   b) if there are any tuples in the tuplestore, move them to tuplesort
+ *
+ *   c) process all ranges with (minval < watermark) - read tuples and feed
+ *      them either into tuplestore (when value < watermark) or tuplestore
+ *
+ *   d) sort the tuplestore, output all the tuples
+ *
+ * 4) if some tuples remain in the tuplestore, sort and output them
+ *
+ * 5) for NULLS LAST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ *
+ * For DESC orderings the process is almost the same, except that we look
+ * at maxval and use '>' operator (but that's transparent).
+ *
+ * There's a couple possible things that might be done in different ways:
+ *
+ * 1) Not using tuplestore, and feeding tuples only to a tuplesort. Then
+ * while producing the tuples, we'd only output tuples up to the current
+ * watermark, and then we'd keep the remaining tuples for the next round.
+ * Either we'd need to transfer them into a second tuplesort, or allow
+ * "reopening" the tuplesort and adding more tuples. And then only the
+ * part since the watermark would get sorted (possibly using a merge-sort
+ * with the already sorted part).
+ *
+ *
+ * 2) The other question is what to do with NULL values - at the moment we
+ * just read the ranges, output the NULL tuples and that's it - we're not
+ * retaining any non-NULL tuples, so that we'll read the ranges again in
+ * the second range. The logic here is that either there are very few
+ * such ranges, so it's won't cost much to just re-read them. Or maybe
+ * there are very many such ranges, and we'd do a lot of spilling to the
+ * tuplestore, and it's not much more expensive to just re-read the source
+ * data. There are counter-examples, though - e.g., there might be many
+ * has_nulls ranges, but with very few non-NULL tuples. In this case it
+ * might be better to actually spill the tuples instead of re-reading all
+ * the ranges. Maybe this is something we can do at run-time, or maybe we
+ * could estimate this at planning time. We do know the null_frac for the
+ * column, so we know the number of NULL rows. And we also know the number
+ * of all_nulls and has_nulls ranges. We can estimate the number of rows
+ * per range, and we can estimate how many non-NULL rows are in the
+ * has_nulls ranges (we don't need to re-read all-nulls ranges). There's
+ * also the filter, which may reduce the amount of rows to store.
+ *
+ * So we'd need to compare two metrics calculated roughly like this:
+ *
+ *   cost(re-reading has-nulls ranges)
+ *      = cost(random_page_cost * n_has_nulls + seq_page_cost * pages_per_range)
+ *
+ *   cost(spilling non-NULL rows from has-nulls ranges)
+ *      = cost(numrows * width / BLCKSZ * seq_page_cost * 2)
+ *
+ * where numrows is the number of non-NULL rows in has_null ranges, which
+ * can be calculated like this:
+ *
+ *   // estimated number of rows in has-null ranges
+ *   rows_in_has_nulls = (reltuples / relpages) * pages_per_range * n_has_nulls
+ *
+ *   // number of NULL rows in the has-nulls ranges
+ *   nulls_in_ranges = reltuples * null_frac - n_all_nulls * (reltuples / relpages)
+ *
+ *   // numrows is the difference, multiplied by selectivity of the index
+ *   // filter condition (value between 0.0 and 1.0)
+ *   numrows = (rows_in_has_nulls - nulls_in_ranges) * selectivity
+ *
+ * This ignores non-summarized ranges, but there should be only very few of
+ * those, so it should not make a huge difference. Otherwise we can divide
+ * them between regular, has-nulls and all-nulls pages to keep the ratio.
+ *
+ *
+ * 3) How large step to make when updating the watermark?
+ *
+ * When updating the watermark, one option is to simply proceed to the next
+ * distinct minval value, which is the smallest possible step we can make.
+ * This may be both fine and very inefficient, depending on how many rows
+ * end up in the tuplesort and how many rows we end up spilling (possibly
+ * repeatedly to the tuplestore).
+ *
+ * When having to sort large number of rows, it's inefficient to run many
+ * tiny sorts, even if it produces correct result. For example when sorting
+ * 1M rows, we may split this as either (a) 100000x sorts of 10 rows, or
+ * (b) 1000 sorts of 1000 rows. The (b) option is almost certainly more
+ * efficient. Maybe sorts of 10k rows would be even better, if it fits
+ * into work_mem.
+ *
+ * This gets back to how large the page ranges are, and if/how much they
+ * overlap. With tiny ranges (e.g. a single-page ranges), a single range
+ * can only add as many rows as we can fit on a single page. So we need
+ * more ranges by default - how many watermark steps that is depends on
+ * how many distinct minval values are there ...
+ *
+ * Then there's overlaps - if ranges do not overlap, we're done and we'll
+ * add the whole range because the next watermark is above maxval. But
+ * when the ranges overlap, we'll only add the first part (assuming the
+ * minval of the next range is the watermark). Assume 10 overlapping
+ * ranges - imagine for example ranges shifted by 10%, so something like
+ *
+ *   [0,100] [10,110], [20,120], [30, 130], ..., [90, 190]
+ *
+ * In the first step we use watermark=10 and load the first range, with
+ * maybe 1000 rows in total. But assuming uniform distribution, only about
+ * 100 rows will go into the tuplesort, the remaining 900 rows will go into
+ * the tuplestore (assuming uniform distribution). Then in the second step
+ * we sort another 100 rows and the remaining 800 rows will be moved into
+ * a new tuplestore. And so on and so on.
+ *
+ * This means that incrementing the watermarks by single steps may be
+ * quite inefficient, and we need to reflect both the range size and
+ * how much the ranges overlap.
+ *
+ * In fact, maybe we should not determine the step as number of minval
+ * values to skip, but how many ranges would that mean reading. Because
+ * if we have a minval with many duplicates, that may load many rows.
+ * Or even better, we could look at how many rows would that mean loading
+ * into the tuplestore - if we track P(x<minval) for each range (e.g. by
+ * calculating average value during ANALYZE, or perhaps by estimating
+ * it from per-column stats), then we know the increment is going to be
+ * about
+ *
+ *     P(x < minval[i]) - P(x < minval[i-1])
+ *
+ * and we can stop once we'd exceed work_mem (with some slack). See comment
+ * for brin_minmax_stats() for more thoughts.
+ *
+ *
+ * 4) LIMIT/OFFSET vs. full sort
+ *
+ * There's one case where very small sorts may be actually optimal, and
+ * that's queries that need to process only very few rows - say, LIMIT
+ * queries with very small bound.
+ *
+ *
+ * FIXME handling of other brin opclasses (minmax-multi)
+ *
+ * FIXME improve costing
+ *
+ *
+ * Improvement ideas:
+ *
+ * 1) multiple tuplestores for overlapping ranges
+ *
+ * When there are many overlapping ranges (so that maxval > current.maxval),
+ * we're loading all the "future" tuples into a new tuplestore. However, if
+ * there are multiple such ranges (imagine ranges "shifting" by 10%, which
+ * gives us 9 more ranges), we know in the next round we'll only need rows
+ * until the next maxval. We'll not sort these rows, but we'll still shuffle
+ * them around until we get to the proper range (so about 10x each row).
+ * Maybe we should pre-allocate the tuplestores (or maybe even tuplesorts)
+ * for future ranges, and route the tuples to the correct one? Maybe we
+ * could be a bit smarter and discard tuples once we have enough rows for
+ * the preceding ranges (say, with LIMIT queries). We'd also need to worry
+ * about work_mem, though - we can't just use many tuplestores, each with
+ * whole work_mem. So we'd probably use e.g. work_mem/2 for the next one,
+ * and then /4, /8 etc. for the following ones. That's work_mem in total.
+ * And there'd need to be some limit on number of tuplestores, I guess.
+ *
+ * 2) handling NULL values
+ *
+ * We need to handle NULLS FIRST / NULLS LAST cases. The question is how
+ * to do that - the easiest way is to simply do a separate scan of ranges
+ * that might contain NULL values, processing just rows with NULLs, and
+ * discarding other rows. And then process non-NULL values as currently.
+ * The NULL scan would happen before/after this regular phase.
+ *
+ * Byt maybe we could be smarter, and not do separate scans. When reading
+ * a page, we might stash the tuple in a tuplestore, so that we can read
+ * it the next round. Obviously, this might be expensive if we need to
+ * keep too many rows, so the tuplestore would grow too large - in that
+ * case it might be better to just do the two scans.
+ *
+ * 3) parallelism
+ *
+ * Presumably we could do a parallel version of this. The leader or first
+ * worker would prepare the range information, and the workers would then
+ * grab ranges (in a kinda round robin manner), sort them independently,
+ * and then the results would be merged by Gather Merge.
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeBrinSort.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *		ExecBrinSort			scans a relation using an index
+ *		IndexNext				retrieve next tuple using index
+ *		ExecInitBrinSort		creates and initializes state info.
+ *		ExecReScanBrinSort		rescans the indexed relation.
+ *		ExecEndBrinSort			releases all storage.
+ *		ExecBrinSortMarkPos		marks scan position.
+ *		ExecBrinSortRestrPos	restores scan position.
+ *		ExecBrinSortEstimate	estimates DSM space needed for parallel index scan
+ *		ExecBrinSortInitializeDSM initialize DSM for parallel BrinSort
+ *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
+ *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
+ */
+#include "postgres.h"
+
+#include "access/brin.h"
+#include "access/brin_internal.h"
+#include "access/nbtree.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
+#include "executor/execdebug.h"
+#include "executor/nodeBrinSort.h"
+#include "lib/pairingheap.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/array.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+
+static TupleTableSlot *IndexNext(BrinSortState *node);
+static bool IndexRecheck(BrinSortState *node, TupleTableSlot *slot);
+static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
+
+#ifdef DEBUG_BRIN_SORT
+bool debug_brin_sort = false;
+#endif
+
+/* do various consistency checks */
+static void
+AssertCheckRanges(BrinSortState *node)
+{
+#ifdef USE_ASSERT_CHECKING
+
+#endif
+}
+
+/*
+ * brinsort_start_tidscan
+ *		Start scanning tuples from a given page range.
+ *
+ * We open a TID range scan for the given range, and initialize the tuplesort.
+ * Optionally, we update the watermark (with either high/low value). We only
+ * need to do this for the main page range, not for the intersecting ranges.
+ *
+ * XXX Maybe we should initialize the tidscan only once, and then do rescan
+ * for the following ranges? And similarly for the tuplesort?
+ */
+static void
+brinsort_start_tidscan(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate = node->ss.ps.state;
+	BrinRange  *range = node->bs_range;
+
+	/* There must not be any TID scan in progress yet. */
+	Assert(node->ss.ss_currentScanDesc == NULL);
+
+	/* Initialize the TID range scan, for the provided block range. */
+	if (node->ss.ss_currentScanDesc == NULL)
+	{
+		TableScanDesc		tscandesc;
+		ItemPointerData		mintid,
+							maxtid;
+
+		ItemPointerSetBlockNumber(&mintid, range->blkno_start);
+		ItemPointerSetOffsetNumber(&mintid, 0);
+
+		ItemPointerSetBlockNumber(&maxtid, range->blkno_end);
+		ItemPointerSetOffsetNumber(&maxtid, MaxHeapTuplesPerPage);
+
+		elog(DEBUG1, "loading range blocks [%u, %u]",
+			 range->blkno_start, range->blkno_end);
+
+		tscandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 &mintid, &maxtid);
+		node->ss.ss_currentScanDesc = tscandesc;
+	}
+
+	if (node->bs_tuplesortstate == NULL)
+	{
+		TupleDesc	tupDesc = (node->ss.ps.ps_ResultTupleDesc);
+
+		node->bs_tuplesortstate = tuplesort_begin_heap(tupDesc,
+													plan->numCols,
+													plan->sortColIdx,
+													plan->sortOperators,
+													plan->collations,
+													plan->nullsFirst,
+													work_mem,
+													NULL,
+													TUPLESORT_NONE);
+	}
+
+	if (node->bs_tuplestore == NULL)
+	{
+		node->bs_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+	}
+}
+
+/*
+ * brinsort_end_tidscan
+ *		Finish the TID range scan.
+ */
+static void
+brinsort_end_tidscan(BrinSortState *node)
+{
+	/* get the first range, read all tuples using a tid range scan */
+	if (node->ss.ss_currentScanDesc != NULL)
+	{
+		table_endscan(node->ss.ss_currentScanDesc);
+		node->ss.ss_currentScanDesc = NULL;
+	}
+}
+
+/*
+ * brinsort_update_watermark
+ *		Advance the watermark to the next minval (or maxval for DESC).
+ *
+ * We could could actually advance the watermark by multiple steps (not to
+ * the immediately following minval, but a couple more), to accumulate more
+ * rows in the tuplesort. The number of steps we make correlates with the
+ * amount of data we sort in a given step, but we don't know in advance
+ * how many rows (or bytes) will that actually be. We could do some simple
+ * heuristics (measure past sorts and extrapolate).
+ *
+ * XXX With a separate _set and _empty flags, we don't really need to pass
+ * a separate "first" parameter - "set=false" has the same meaning.
+ */
+static void
+brinsort_update_watermark(BrinSortState *node, bool asc)
+{
+	int		cmp;
+	bool	found = false;
+
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool	isnull;
+		Datum	value;
+		bool	all_nulls;
+		bool	not_summarized;
+
+		all_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 4, &isnull));
+		Assert(!isnull);
+
+		not_summarized = DatumGetBool(slot_getattr(node->bs_scan->slot, 5, &isnull));
+		Assert(!isnull);
+
+		/* we ignore ranges that are either all_nulls or not summarized */
+		if (all_nulls || not_summarized)
+			continue;
+
+		/* use either minval or maxval, depending on the ASC / DESC */
+		if (asc)
+			value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		else
+			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		if (!node->bs_watermark_set)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			continue;
+		}
+
+		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
+								  &node->bs_sortsupport);
+
+		if (cmp < 0)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			found = true;
+			break;
+		}
+	}
+
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	node->bs_watermark_empty = (!found);
+}
+
+/*
+ * brinsort_load_tuples
+ *		Load tuples from the TID range scan, add them to tuplesort/store.
+ *
+ * When called for the "current" range, we don't need to check the watermark,
+ * we know the tuple goes into the tuplesort. So with check_watermark we
+ * skip the comparator call to save CPU cost.
+ */
+static void
+brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_processing)
+{
+	BrinSort	   *plan = (BrinSort *) node->ss.ps.plan;
+	TableScanDesc	scan;
+	EState		   *estate;
+	ScanDirection	direction;
+	TupleTableSlot *slot;
+	BrinRange	   *range = node->bs_range;
+	ProjectionInfo *projInfo;
+
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	projInfo = node->bs_ProjInfo;
+
+	slot = node->ss.ss_ScanTupleSlot;
+
+	Assert(node->bs_range != NULL);
+
+	/*
+	 * If we're not processign NULLS, and this is all-nulls range, we can
+	 * just skip it - we won't find any non-NULL tuples in it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (!null_processing && range->all_nulls)
+		return;
+
+	/*
+	 * Similarly, if we're processing NULLs and this range does not have
+	 * has_nulls flag, we can skip it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
+		return;
+
+	brinsort_start_tidscan(node);
+
+	scan = node->ss.ss_currentScanDesc;
+
+	/*
+	 * Read tuples, evaluate the filter (so that we don't keep tuples only to
+	 * discard them later), and decide if it goes into the current range
+	 * (tuplesort) or overflow (tuplestore).
+	 */
+	while (table_scan_getnextslot_tidrange(scan, direction, slot))
+	{
+		ExprContext *econtext;
+		ExprState  *qual;
+
+		/*
+		 * Fetch data from node
+		 */
+		qual = node->bs_qual;
+		econtext = node->ss.ps.ps_ExprContext;
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 *
+		 * XXX Done here, because in ExecScan we'll get different slot type
+		 * (minimal tuple vs. buffered tuple). Scan expects slot while reading
+		 * from the table (like here), but we're stashing it into a tuplesort.
+		 *
+		 * XXX Maybe we could eliminate many tuples by leveraging the BRIN
+		 * range, by executing the consistent function. But we don't have
+		 * the qual in appropriate format at the moment, so we'd preprocess
+		 * the keys similarly to bringetbitmap(). In which case we should
+		 * probably evaluate the stuff while building the ranges? Although,
+		 * if the "consistent" function is expensive, it might be cheaper
+		 * to do that incrementally, as we need the ranges. Would be a win
+		 * for LIMIT queries, for example.
+		 *
+		 * XXX However, maybe we could also leverage other bitmap indexes,
+		 * particularly for BRIN indexes because that makes it simpler to
+		 * eliminate the ranges incrementally - we know which ranges to
+		 * load from the index, while for other indexes (e.g. btree) we
+		 * have to read the whole index and build a bitmap in order to have
+		 * a bitmap for any range. Although, if the condition is very
+		 * selective, we may need to read only a small fraction of the
+		 * index, so maybe that's OK.
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			int		cmp = 0;	/* matters for check_watermark=false */
+			Datum	value;
+			bool	isnull;
+			TupleTableSlot *tmpslot;
+
+			if (projInfo)
+				tmpslot = ExecProject(projInfo);
+			else
+				tmpslot = slot;
+
+			value = slot_getattr(tmpslot, plan->sortColIdx[0], &isnull);
+
+			/*
+			 * Handle NULL values - stash them into the tuplestore, and then
+			 * we'll output them in "process" stage.
+			 *
+			 * XXX Can we be a bit smarter for LIMIT queries and stop reading
+			 * rows once we get the number we need to produce? Probably not,
+			 * because the ordering may reference other columns (which we may
+			 * satisfy through IncrementalSort). But all NULL columns are
+			 * considered equal, so we need all the rows to properly compare
+			 * the other keys.
+			 */
+			if (null_processing)
+			{
+				/* Stash it to the tuplestore (when NULL, or ignore
+				 * it (when not-NULL). */
+				if (isnull)
+					tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+
+				/* NULL or not, we're done */
+				continue;
+			}
+
+			/* we're not processing NULL values, so ignore NULLs */
+			if (isnull)
+				continue;
+
+			/*
+			 * Otherwise compare to watermark, and stash it either to the
+			 * tuplesort or tuplestore.
+			 */
+			if (check_watermark && node->bs_watermark_set && !node->bs_watermark_empty)
+				cmp = ApplySortComparator(value, false,
+										  node->bs_watermark, false,
+										  &node->bs_sortsupport);
+
+			if (cmp <= 0)
+				tuplesort_puttupleslot(node->bs_tuplesortstate, tmpslot);
+			else
+			{
+				/*
+				 * XXX We can be a bit smarter for LIMIT queries - once we
+				 * know we have more rows in the tuplesort than we need to
+				 * output, we can stop spilling - those rows are not going
+				 * to be needed. We can discard the tuplesort (no need to
+				 * respill) and stop spilling.
+				 */
+				tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+			}
+		}
+
+		ExecClearTuple(slot);
+	}
+
+	ExecClearTuple(slot);
+
+	brinsort_end_tidscan(node);
+}
+
+/*
+ * brinsort_load_spill_tuples
+ *		Load tuples from the spill tuplestore, and either stash them into
+ *		a tuplesort or a new tuplestore.
+ *
+ * After processing the last range, we want to process all remaining ranges,
+ * so with check_watermark=false we skip the check.
+ */
+static void
+brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	Tuplestorestate *tupstore;
+	TupleTableSlot *slot;
+	ProjectionInfo *projInfo;
+
+	projInfo = node->bs_ProjInfo;
+
+	if (node->bs_tuplestore == NULL)
+		return;
+
+	/* start scanning the existing tuplestore (XXX needed?) */
+	tuplestore_rescan(node->bs_tuplestore);
+
+	/*
+	 * Create a new tuplestore, for tuples that exceed the watermark and so
+	 * should not be included in the current sort.
+	 */
+	tupstore = tuplestore_begin_heap(false, false, work_mem);
+
+	/*
+	 * We need a slot for minimal tuples. The scan slot uses buffered tuples,
+	 * so it'd trigger an error in the loop.
+	 */
+	if (projInfo)
+		slot = node->ss.ps.ps_ResultTupleSlot;
+	else
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(node->ss.ss_currentRelation),
+									&TTSOpsMinimalTuple);
+
+	while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+	{
+		int		cmp = 0;	/* matters for check_watermark=false */
+		bool	isnull;
+		Datum	value;
+
+		value = slot_getattr(slot, plan->sortColIdx[0], &isnull);
+
+		/* We shouldn't have NULL values in the spill, at least not now. */
+		Assert(!isnull);
+
+		if (check_watermark && node->bs_watermark_set && !node->bs_watermark_empty)
+			cmp = ApplySortComparator(value, false,
+									  node->bs_watermark, false,
+									  &node->bs_sortsupport);
+
+		if (cmp <= 0)
+			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+		else
+		{
+			/*
+			 * XXX We can be a bit smarter for LIMIT queries - once we
+			 * know we have more rows in the tuplesort than we need to
+			 * output, we can stop spilling - those rows are not going
+			 * to be needed. We can discard the tuplesort (no need to
+			 * respill) and stop spilling.
+			 */
+			tuplestore_puttupleslot(tupstore, slot);
+		}
+	}
+
+	/*
+	 * Discard the existing tuplestore (that we just processed), use the new
+	 * one instead.
+	 */
+	tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = tupstore;
+
+	if (!projInfo)
+		ExecDropSingleTupleTableSlot(slot);
+}
+
+static bool
+brinsort_next_range(BrinSortState *node, bool asc)
+{
+	/* FIXME free the current bs_range, if any */
+	node->bs_range = NULL;
+
+	/*
+	 * Mark the position, so that we can restore it in case we reach the
+	 * current watermark.
+	 */
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	/*
+	 * Get the next range and return it, unless we can prove it's the last
+	 * range that can possibly match the current conditon (thanks to how we
+	 * order the ranges).
+	 *
+	 * Also skip ranges that can't possibly match (e.g. because we are in
+	 * NULL processing, and the range has no NULLs).
+	 */
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool		isnull;
+		Datum		value;
+
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = DatumGetUInt32(slot_getattr(node->bs_scan->slot, 1, &isnull));
+		range->blkno_end = DatumGetUInt32(slot_getattr(node->bs_scan->slot, 2, &isnull));
+		range->has_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 3, &isnull));
+		range->all_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 4, &isnull));
+		range->not_summarized = DatumGetBool(slot_getattr(node->bs_scan->slot, 5, &isnull));
+		range->min_value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		/*
+		 * Not-summarized ranges match irrespectedly of the watermark (if
+		 * it's set at all).
+		 */
+		if (range->not_summarized)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * The range is summarized, but maybe the watermark is not? That
+		 * would mean we're processing NULL values, so we skip ranges that
+		 * can't possibly match (i.e. with all_nulls=has_nulls=false).
+		 */
+		if (!node->bs_watermark_set)
+		{
+			if (range->all_nulls || range->has_nulls)
+			{
+				node->bs_range = range;
+				return true;
+			}
+
+			/* update the position and try the next range */
+			tuplesort_markpos(node->bs_scan->ranges);
+			pfree(range);
+
+			continue;
+		}
+
+		/*
+		 * Watermark is set, but it's empty - everything matches (except
+		 * for NULL-only ranges, because we're definitely not processing
+		 * NULLS, because then we wouldn't have watermark set).
+		 */
+		if (node->bs_watermark_empty)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * So now we have a summarized range, and we know the watermark
+		 * is set too (so we're not processing NULLs). We place the ranges
+		 * with only nulls last, so once we hit one we're done.
+		 */
+		if (range->all_nulls)
+		{
+			pfree(range);
+			return false;	/* no more matching ranges */
+		}
+
+		/*
+		 * Compare the range to the watermark, using either the minval or
+		 * maxval, depending on ASC/DESC ordering. If the range precedes the
+		 * watermark, return it. Otherwise abort, all the future ranges are
+		 * either not matching the watermark (thanks to ordering) or contain
+		 * only NULL values.
+		 */
+
+		/* use minval or maxval, depending on ASC / DESC */
+		value = (asc) ? range->min_value : range->max_value;
+
+		/*
+		 * compare it to the current watermark (if set)
+		 *
+		 * XXX We don't use (... <= 0) here, because then we'd load ranges
+		 * with that minval (and there might be multiple), but most of the
+		 * rows would go into the tuplestore, because only rows matching the
+		 * minval exactly would be loaded into tuplesort.
+		 */
+		if (ApplySortComparator(value, false,
+								 node->bs_watermark, false,
+								 &node->bs_sortsupport) < 0)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		pfree(range);
+		break;
+	}
+
+	/* not a matching range, we're done */
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	return false;
+}
+
+static bool
+brinsort_range_with_nulls(BrinSortState *node)
+{
+	BrinRange *range = node->bs_range;
+
+	if (range->all_nulls || range->has_nulls || range->not_summarized)
+		return true;
+
+	return false;
+}
+
+static void
+brinsort_rescan(BrinSortState *node)
+{
+	tuplesort_rescan(node->bs_scan->ranges);
+}
+
+/* ----------------------------------------------------------------
+ *		IndexNext
+ *
+ *		Retrieve a tuple from the BrinSort node's currentRelation
+ *		using the index specified in the BrinSortState information.
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+IndexNext(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate;
+	ScanDirection direction;
+	IndexScanDesc scandesc;
+	TupleTableSlot *slot;
+	bool		nullsFirst;
+	bool		asc;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+
+	/* flip direction if this is an overall backward scan */
+	/* XXX For BRIN indexes this is always forward direction */
+	// if (ScanDirectionIsBackward(((BrinSort *) node->ss.ps.plan)->indexorderdir))
+	if (false)
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	scandesc = node->iss_ScanDesc;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	nullsFirst = plan->nullsFirst[0];
+	asc = ScanDirectionIsForward(plan->indexorderdir);
+
+	if (scandesc == NULL)
+	{
+		/*
+		 * We reach here if the index scan is not parallel, or if we're
+		 * serially executing an index scan that was planned to be parallel.
+		 */
+		scandesc = index_beginscan(node->ss.ss_currentRelation,
+								   node->iss_RelationDesc,
+								   estate->es_snapshot,
+								   node->iss_NumScanKeys,
+								   node->iss_NumOrderByKeys);
+
+		node->iss_ScanDesc = scandesc;
+
+		/*
+		 * If no run-time keys to calculate or they are ready, go ahead and
+		 * pass the scankeys to the index AM.
+		 */
+		if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+			index_rescan(scandesc,
+						 node->iss_ScanKeys, node->iss_NumScanKeys,
+						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+
+		/*
+		 * Load info about BRIN ranges, sort them to match the desired ordering.
+		 */
+		ExecInitBrinSortRanges(plan, node);
+		node->bs_phase = BRINSORT_START;
+	}
+
+	/*
+	 * ok, now that we have what we need, fetch the next tuple.
+	 */
+	while (node->bs_phase != BRINSORT_FINISHED)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		elog(DEBUG1, "phase = %d", node->bs_phase);
+
+		AssertCheckRanges(node);
+
+		switch (node->bs_phase)
+		{
+			case BRINSORT_START:
+
+				elog(DEBUG1, "phase = START");
+
+				/*
+				 * If we have NULLS FIRST, move to that stage. Otherwise
+				 * start scanning regular ranges.
+				 */
+				if (nullsFirst)
+					node->bs_phase = BRINSORT_LOAD_NULLS;
+				else
+				{
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+
+					/* set the first watermark */
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_RANGE:
+				{
+					elog(DEBUG1, "phase = LOAD_RANGE");
+
+					/*
+					 * Load tuples matching the new watermark from the existing
+					 * spill tuplestore. We do this before loading tuples from
+					 * the next chunk of ranges, because those will add tuples
+					 * to the spill, and we'd end up processing those twice.
+					 */
+					brinsort_load_spill_tuples(node, true);
+
+					/*
+					 * Load tuples from ranges, until we find a range that has
+					 * min_value >= watermark.
+					 *
+					 * XXX In fact, we are guaranteed to find an exact match
+					 * for the watermark, because of how we pick the watermark.
+					 */
+					while (brinsort_next_range(node, asc))
+						brinsort_load_tuples(node, true, false);
+
+					/*
+					 * If we have loaded any tuples into the tuplesort, try
+					 * sorting it and move to producing the tuples.
+					 *
+					 * XXX The range might have no rows matching the current
+					 * watermark, in which case the tuplesort is empty.
+					 */
+					if (node->bs_tuplesortstate)
+					{
+#ifdef DEBUG_BRIN_SORT
+						tuplesort_reset_stats(node->bs_tuplesortstate);
+#endif
+
+						tuplesort_performsort(node->bs_tuplesortstate);
+
+#ifdef DEBUG_BRIN_SORT
+						if (debug_brin_sort)
+						{
+							TuplesortInstrumentation stats;
+
+							memset(&stats, 0, sizeof(TuplesortInstrumentation));
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							elog(WARNING, "method: %s  space: %ld kB (%s)",
+								 tuplesort_method_name(stats.sortMethod),
+								 stats.spaceUsed,
+								 tuplesort_space_type_name(stats.spaceType));
+						}
+#endif
+					}
+
+					node->bs_phase = BRINSORT_PROCESS_RANGE;
+					break;
+				}
+
+			case BRINSORT_PROCESS_RANGE:
+
+				elog(DEBUG1, "phase BRINSORT_PROCESS_RANGE");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplesortstate != NULL)
+				{
+					if (tuplesort_gettupleslot(node->bs_tuplesortstate,
+										ScanDirectionIsForward(direction),
+										false, slot, NULL))
+						return slot;
+
+					/* once we're done with the tuplesort, reset it */
+					tuplesort_reset(node->bs_tuplesortstate);
+				}
+
+				/*
+				 * Now that we processed tuples from the last range batch,
+				 * see if we reached the end of if we should try updating
+				 * the watermark once again. If the watermark is not set,
+				 * we've already processed the last range.
+				 */
+				if (node->bs_watermark_empty)
+				{
+					if (nullsFirst)
+						node->bs_phase = BRINSORT_FINISHED;
+					else
+					{
+						brinsort_rescan(node);
+						node->bs_phase = BRINSORT_LOAD_NULLS;
+						node->bs_watermark_set = false;
+						node->bs_watermark_empty = false;
+					}
+				}
+				else
+				{
+					/* updte the watermark and try reading more ranges */
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_NULLS:
+				{
+					elog(DEBUG1, "phase = LOAD_NULLS");
+
+					/*
+					 * Try loading another range. If there are no more ranges,
+					 * we're done and we move either to loading regular ranges.
+					 * Otherwise check if this range can contain NULL values.
+					 * If yes, process the range. If not, try loading another
+					 * one from the list.
+					 */
+					while (true)
+					{
+						/* no more ranges - terminate or load regular ranges */
+						if (!brinsort_next_range(node, asc))
+						{
+							if (nullsFirst)
+							{
+								brinsort_rescan(node);
+								node->bs_phase = BRINSORT_LOAD_RANGE;
+								brinsort_update_watermark(node, asc);
+							}
+							else
+								node->bs_phase = BRINSORT_FINISHED;
+
+							break;
+						}
+
+						/* If this range (may) have nulls, proces them */
+						if (brinsort_range_with_nulls(node))
+							break;
+					}
+
+					if (node->bs_range == NULL)
+						break;
+
+					/*
+					 * There should be nothing left in the tuplestore, because
+					 * we flush that at the end of processing regular tuples,
+					 * and we don't retain tuples between NULL ranges.
+					 */
+					// Assert(node->bs_tuplestore == NULL);
+
+					/*
+					 * Load the next unprocessed / NULL range. We don't need to
+					 * check watermark while processing NULLS.
+					 */
+					brinsort_load_tuples(node, false, true);
+
+					node->bs_phase = BRINSORT_PROCESS_NULLS;
+					break;
+				}
+
+				break;
+
+			case BRINSORT_PROCESS_NULLS:
+
+				elog(DEBUG1, "phase = LOAD_NULLS");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				Assert(node->bs_tuplestore != NULL);
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplestore != NULL)
+				{
+
+					while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+						return slot;
+
+					tuplestore_end(node->bs_tuplestore);
+					node->bs_tuplestore = NULL;
+
+					node->bs_phase = BRINSORT_LOAD_NULLS;	/* load next range */
+				}
+
+				break;
+
+			case BRINSORT_FINISHED:
+				elog(ERROR, "unexpected BrinSort phase: FINISHED");
+				break;
+		}
+	}
+
+	/*
+	 * if we get here it means the index scan failed so we are at the end of
+	 * the scan..
+	 */
+	node->iss_ReachedEnd = true;
+	return ExecClearTuple(slot);
+}
+
+/*
+ * IndexRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+IndexRecheck(BrinSortState *node, TupleTableSlot *slot)
+{
+	ExprContext *econtext;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	econtext = node->ss.ps.ps_ExprContext;
+
+	/* Does the tuple meet the indexqual condition? */
+	econtext->ecxt_scantuple = slot;
+	return ExecQualAndReset(node->indexqualorig, econtext);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSort(node)
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecBrinSort(PlanState *pstate)
+{
+	BrinSortState *node = castNode(BrinSortState, pstate);
+
+	/*
+	 * If we have runtime keys and they've not already been set up, do it now.
+	 */
+	if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady)
+		ExecReScan((PlanState *) node);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) IndexNext,
+					(ExecScanRecheckMtd) IndexRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanBrinSort(node)
+ *
+ *		Recalculates the values of any scan keys whose value depends on
+ *		information known at runtime, then rescans the indexed relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanBrinSort(BrinSortState *node)
+{
+	/*
+	 * If we are doing runtime key calculations (ie, any of the index key
+	 * values weren't simple Consts), compute the new key values.  But first,
+	 * reset the context so we don't leak memory as each outer tuple is
+	 * scanned.  Note this assumes that we will recalculate *all* runtime keys
+	 * on each call.
+	 */
+	if (node->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *econtext = node->iss_RuntimeContext;
+
+		ResetExprContext(econtext);
+		ExecIndexEvalRuntimeKeys(econtext,
+								 node->iss_RuntimeKeys,
+								 node->iss_NumRuntimeKeys);
+	}
+	node->iss_RuntimeKeysReady = true;
+
+	/* reset index scan */
+	if (node->iss_ScanDesc)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+	node->iss_ReachedEnd = false;
+
+	ExecScanReScan(&node->ss);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecEndBrinSort
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndBrinSort(BrinSortState *node)
+{
+	Relation	indexRelationDesc;
+	IndexScanDesc IndexScanDesc;
+
+	/*
+	 * extract information from the node
+	 */
+	indexRelationDesc = node->iss_RelationDesc;
+	IndexScanDesc = node->iss_ScanDesc;
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/*
+	 * close the index relation (no-op if we didn't open it)
+	 */
+	if (IndexScanDesc)
+		index_endscan(IndexScanDesc);
+	if (indexRelationDesc)
+		index_close(indexRelationDesc, NoLock);
+
+	if (node->ss.ss_currentScanDesc != NULL)
+		table_endscan(node->ss.ss_currentScanDesc);
+
+	if (node->bs_tuplestore != NULL)
+		tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = NULL;
+
+	if (node->bs_tuplesortstate != NULL)
+		tuplesort_end(node->bs_tuplesortstate);
+	node->bs_tuplesortstate = NULL;
+
+	if (node->bs_scan->ranges != NULL)
+		tuplesort_end(node->bs_scan->ranges);
+	node->bs_scan->ranges = NULL;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortMarkPos
+ *
+ * Note: we assume that no caller attempts to set a mark before having read
+ * at least one tuple.  Otherwise, iss_ScanDesc might still be NULL.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortMarkPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  If a test tuple exists for
+		 * this relation, then we shouldn't access the index at all.  We would
+		 * instead need to save, and later restore, the state of the
+		 * relsubs_done flag, so that re-fetching the test tuple is possible.
+		 * However, given the assumption that no caller sets a mark at the
+		 * start of the scan, we can only get here with relsubs_done[i]
+		 * already set, and so no state need be saved.
+		 */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortMarkPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_markpos(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecIndexRestrPos
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortRestrPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (estate->es_epq_active != NULL)
+	{
+		/* See comments in ExecIndexMarkPos */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortRestrPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_restrpos(node->iss_ScanDesc);
+}
+
+/*
+ * somewhat crippled verson of bringetbitmap
+ *
+ * XXX We don't call consistent function (or any other function), so unlike
+ * bringetbitmap we don't set a separate memory context. If we end up filtering
+ * the ranges somehow (e.g. by WHERE conditions), this might be necessary.
+ *
+ * XXX Should be part of opclass, to somewhere in brin_minmax.c etc.
+ */
+static void
+ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
+{
+	IndexScanDesc	scan = planstate->iss_ScanDesc;
+	Relation	indexRel = planstate->iss_RelationDesc;
+	int			attno;
+	FmgrInfo   *rangeproc;
+	BrinRangeScanDesc *brscan;
+	bool		asc;
+
+	/* BRIN Sort only allows ORDER BY using a single column */
+	Assert(node->numCols == 1);
+
+	attno = node->attnums[0];
+
+	/*
+	 * Make sure we matched the sort key - if not, we should not have got
+	 * to this place at all (try sorting using this index).
+	 */
+	Assert(AttrNumberIsForUserDefinedAttr(attno));
+
+	/*
+	 * get procedure to generate sort ranges
+	 *
+	 * FIXME we can't rely on a particular procnum to identify which opclass
+	 * allows building sort ranges, because the optinal procnums are not
+	 * unique (e.g. inclusion_ops have 12 too). So we probably need a flag
+	 * for the opclass.
+	 */
+	rangeproc = index_getprocinfo(indexRel, attno, BRIN_PROCNUM_RANGES);
+
+	/*
+	 * Should not get here without a proc, thanks to the check before
+	 * building the BrinSort path.
+	 */
+	Assert(OidIsValid(rangeproc->fn_oid));
+
+	memset(&planstate->bs_sortsupport, 0, sizeof(SortSupportData));
+
+	planstate->bs_sortsupport.ssup_collation = node->collations[0];
+	planstate->bs_sortsupport.ssup_cxt = CurrentMemoryContext; // FIXME
+
+	PrepareSortSupportFromOrderingOp(node->sortOperators[0], &planstate->bs_sortsupport);
+
+	/*
+	 * Determine if this ASC or DESC sort, so that we can request the
+	 * ranges in the appropriate order (ordered either by minval for
+	 * ASC, or by maxval for DESC).
+	 */
+	asc = ScanDirectionIsForward(node->indexorderdir);
+
+	/*
+	 * Ask the opclass to produce ranges in appropriate ordering.
+	 *
+	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
+	 */
+	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
+											node->collations[0],
+											PointerGetDatum(scan),
+											Int16GetDatum(attno),
+											BoolGetDatum(asc)));
+
+	/* allocate for space, and also for the alternative ordering */
+	planstate->bs_scan = brscan;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitBrinSort
+ *
+ *		Initializes the index scan's state information, creates
+ *		scan keys, and opens the base and index relations.
+ *
+ *		Note: index scans have 2 sets of state information because
+ *			  we have to keep track of the base relation and the
+ *			  index relation.
+ * ----------------------------------------------------------------
+ */
+BrinSortState *
+ExecInitBrinSort(BrinSort *node, EState *estate, int eflags)
+{
+	BrinSortState *indexstate;
+	Relation	currentRelation;
+	LOCKMODE	lockmode;
+
+	/*
+	 * create state structure
+	 */
+	indexstate = makeNode(BrinSortState);
+	indexstate->ss.ps.plan = (Plan *) node;
+	indexstate->ss.ps.state = estate;
+	indexstate->ss.ps.ExecProcNode = ExecBrinSort;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &indexstate->ss.ps);
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	indexstate->ss.ss_currentRelation = currentRelation;
+	indexstate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &indexstate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+	// ExecInitResultTypeTL(&indexstate->ss.ps);
+	// ExecAssignScanProjectionInfo(&indexstate->ss);
+	// ExecInitResultSlot(&indexstate->ss.ps, &TTSOpsVirtual);
+
+	indexstate->bs_ProjInfo = ExecBuildProjectionInfo(((Plan *) node)->targetlist,
+													  indexstate->ss.ps.ps_ExprContext,
+													  indexstate->ss.ps.ps_ResultTupleSlot,
+													  &indexstate->ss.ps,
+													  indexstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor);
+
+	/*
+	 * initialize child expressions
+	 *
+	 * Note: we don't initialize all of the indexqual expression, only the
+	 * sub-parts corresponding to runtime keys (see below).  Likewise for
+	 * indexorderby, if any.  But the indexqualorig expression is always
+	 * initialized even though it will only be used in some uncommon cases ---
+	 * would be nice to improve that.  (Problem is that any SubPlans present
+	 * in the expression must be found now...)
+	 */
+	indexstate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) indexstate);
+	indexstate->indexqualorig =
+		ExecInitQual(node->indexqualorig, (PlanState *) indexstate);
+
+	/*
+	 * If we are just doing EXPLAIN (ie, aren't going to run the plan), stop
+	 * here.  This allows an index-advisor plugin to EXPLAIN a plan containing
+	 * references to nonexistent indexes.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return indexstate;
+
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+
+	/*
+	 * Initialize index-specific scan state
+	 */
+	indexstate->iss_RuntimeKeysReady = false;
+	indexstate->iss_RuntimeKeys = NULL;
+	indexstate->iss_NumRuntimeKeys = 0;
+
+	/*
+	 * build the index scan keys from the index qualification
+	 */
+	ExecIndexBuildScanKeys((PlanState *) indexstate,
+						   indexstate->iss_RelationDesc,
+						   node->indexqual,
+						   false,
+						   &indexstate->iss_ScanKeys,
+						   &indexstate->iss_NumScanKeys,
+						   &indexstate->iss_RuntimeKeys,
+						   &indexstate->iss_NumRuntimeKeys,
+						   NULL,	/* no ArrayKeys */
+						   NULL);
+
+	/*
+	 * If we have runtime keys, we need an ExprContext to evaluate them. The
+	 * node's standard context won't do because we want to reset that context
+	 * for every tuple.  So, build another context just like the other one...
+	 * -tgl 7/11/00
+	 */
+	if (indexstate->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *stdecontext = indexstate->ss.ps.ps_ExprContext;
+
+		ExecAssignExprContext(estate, &indexstate->ss.ps);
+		indexstate->iss_RuntimeContext = indexstate->ss.ps.ps_ExprContext;
+		indexstate->ss.ps.ps_ExprContext = stdecontext;
+	}
+	else
+	{
+		indexstate->iss_RuntimeContext = NULL;
+	}
+
+	indexstate->bs_tuplesortstate = NULL;
+	indexstate->bs_qual = indexstate->ss.ps.qual;
+	indexstate->ss.ps.qual = NULL;
+	// ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+
+	/*
+	 * all done.
+	 */
+	return indexstate;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortEstimate(BrinSortState *node,
+					  ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+
+	node->iss_PscanLen = index_parallelscan_estimate(node->iss_RelationDesc,
+													 estate->es_snapshot);
+	shm_toc_estimate_chunk(&pcxt->estimator, node->iss_PscanLen);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeDSM
+ *
+ *		Set up a parallel index scan descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeDSM(BrinSortState *node,
+						   ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_allocate(pcxt->toc, node->iss_PscanLen);
+	index_parallelscan_initialize(node->ss.ss_currentRelation,
+								  node->iss_RelationDesc,
+								  estate->es_snapshot,
+								  piscan);
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, piscan);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortReInitializeDSM(BrinSortState *node,
+							 ParallelContext *pcxt)
+{
+	index_parallelrescan(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeWorker(BrinSortState *node,
+							  ParallelWorkerContext *pwcxt)
+{
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7918bb6f0db..86f91e6577c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -791,6 +791,260 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	path->path.total_cost = startup_cost + run_cost;
 }
 
+void
+cost_brinsort(BrinSortPath *path, PlannerInfo *root, double loop_count,
+		   bool partial_path)
+{
+	IndexOptInfo *index = path->ipath.indexinfo;
+	RelOptInfo *baserel = index->rel;
+	amcostestimate_function amcostestimate;
+	List	   *qpquals;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_run_cost = 0;
+	Cost		indexStartupCost;
+	Cost		indexTotalCost;
+	Selectivity indexSelectivity;
+	double		indexCorrelation,
+				csquared;
+	double		spc_seq_page_cost,
+				spc_random_page_cost;
+	Cost		min_IO_cost,
+				max_IO_cost;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	double		tuples_fetched;
+	double		pages_fetched;
+	double		rand_heap_pages;
+	double		index_pages;
+
+	/* Should only be applied to base relations */
+	Assert(IsA(baserel, RelOptInfo) &&
+		   IsA(index, IndexOptInfo));
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/*
+	 * Mark the path with the correct row estimate, and identify which quals
+	 * will need to be enforced as qpquals.  We need not check any quals that
+	 * are implied by the index's predicate, so we can use indrestrictinfo not
+	 * baserestrictinfo as the list of relevant restriction clauses for the
+	 * rel.
+	 */
+	if (path->ipath.path.param_info)
+	{
+		path->ipath.path.rows = path->ipath.path.param_info->ppi_rows;
+		/* qpquals come from the rel's restriction clauses and ppi_clauses */
+		qpquals = list_concat(extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+														  path->ipath.indexclauses),
+							  extract_nonindex_conditions(path->ipath.path.param_info->ppi_clauses,
+														  path->ipath.indexclauses));
+	}
+	else
+	{
+		path->ipath.path.rows = baserel->rows;
+		/* qpquals come from just the rel's restriction clauses */
+		qpquals = extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+											  path->ipath.indexclauses);
+	}
+
+	if (!enable_indexscan)
+		startup_cost += disable_cost;
+	/* we don't need to check enable_indexonlyscan; indxpath.c does that */
+
+	/*
+	 * Call index-access-method-specific code to estimate the processing cost
+	 * for scanning the index, as well as the selectivity of the index (ie,
+	 * the fraction of main-table tuples we will have to retrieve) and its
+	 * correlation to the main-table tuple order.  We need a cast here because
+	 * pathnodes.h uses a weak function type to avoid including amapi.h.
+	 */
+	amcostestimate = (amcostestimate_function) index->amcostestimate;
+	amcostestimate(root, &path->ipath, loop_count,
+				   &indexStartupCost, &indexTotalCost,
+				   &indexSelectivity, &indexCorrelation,
+				   &index_pages);
+
+	/*
+	 * Save amcostestimate's results for possible use in bitmap scan planning.
+	 * We don't bother to save indexStartupCost or indexCorrelation, because a
+	 * bitmap scan doesn't care about either.
+	 */
+	path->ipath.indextotalcost = indexTotalCost;
+	path->ipath.indexselectivity = indexSelectivity;
+
+	/* all costs for touching index itself included here */
+	startup_cost += indexStartupCost;
+	run_cost += indexTotalCost - indexStartupCost;
+
+	/* estimate number of main-table tuples fetched */
+	tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);
+
+	/* fetch estimated page costs for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/*----------
+	 * Estimate number of main-table pages fetched, and compute I/O cost.
+	 *
+	 * When the index ordering is uncorrelated with the table ordering,
+	 * we use an approximation proposed by Mackert and Lohman (see
+	 * index_pages_fetched() for details) to compute the number of pages
+	 * fetched, and then charge spc_random_page_cost per page fetched.
+	 *
+	 * When the index ordering is exactly correlated with the table ordering
+	 * (just after a CLUSTER, for example), the number of pages fetched should
+	 * be exactly selectivity * table_size.  What's more, all but the first
+	 * will be sequential fetches, not the random fetches that occur in the
+	 * uncorrelated case.  So if the number of pages is more than 1, we
+	 * ought to charge
+	 *		spc_random_page_cost + (pages_fetched - 1) * spc_seq_page_cost
+	 * For partially-correlated indexes, we ought to charge somewhere between
+	 * these two estimates.  We currently interpolate linearly between the
+	 * estimates based on the correlation squared (XXX is that appropriate?).
+	 *
+	 * If it's an index-only scan, then we will not need to fetch any heap
+	 * pages for which the visibility map shows all tuples are visible.
+	 * Hence, reduce the estimated number of heap fetches accordingly.
+	 * We use the measured fraction of the entire heap that is all-visible,
+	 * which might not be particularly relevant to the subset of the heap
+	 * that this query will fetch; but it's not clear how to do better.
+	 *----------
+	 */
+	if (loop_count > 1)
+	{
+		/*
+		 * For repeated indexscans, the appropriate estimate for the
+		 * uncorrelated case is to scale up the number of tuples fetched in
+		 * the Mackert and Lohman formula by the number of scans, so that we
+		 * estimate the number of pages fetched by all the scans; then
+		 * pro-rate the costs for one scan.  In this case we assume all the
+		 * fetches are random accesses.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+
+		/*
+		 * In the perfectly correlated case, the number of pages touched by
+		 * each scan is selectivity * table_size, and we can use the Mackert
+		 * and Lohman formula at the page level to estimate how much work is
+		 * saved by caching across scans.  We still assume all the fetches are
+		 * random, though, which is an overestimate that's hard to correct for
+		 * without double-counting the cache effects.  (But in most cases
+		 * where such a plan is actually interesting, only one page would get
+		 * fetched per scan anyway, so it shouldn't matter much.)
+		 */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		pages_fetched = index_pages_fetched(pages_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		min_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+	}
+	else
+	{
+		/*
+		 * Normal case: apply the Mackert and Lohman formula, and then
+		 * interpolate between that and the correlation-derived result.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		/* max_IO_cost is for the perfectly uncorrelated case (csquared=0) */
+		max_IO_cost = pages_fetched * spc_random_page_cost;
+
+		/* min_IO_cost is for the perfectly correlated case (csquared=1) */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		if (pages_fetched > 0)
+		{
+			min_IO_cost = spc_random_page_cost;
+			if (pages_fetched > 1)
+				min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
+		}
+		else
+			min_IO_cost = 0;
+	}
+
+	if (partial_path)
+	{
+		/*
+		 * Estimate the number of parallel workers required to scan index. Use
+		 * the number of heap pages computed considering heap fetches won't be
+		 * sequential as for parallel scans the pages are accessed in random
+		 * order.
+		 */
+		path->ipath.path.parallel_workers = compute_parallel_worker(baserel,
+															  rand_heap_pages,
+															  index_pages,
+															  max_parallel_workers_per_gather);
+
+		/*
+		 * Fall out if workers can't be assigned for parallel scan, because in
+		 * such a case this path will be rejected.  So there is no benefit in
+		 * doing extra computation.
+		 */
+		if (path->ipath.path.parallel_workers <= 0)
+			return;
+
+		path->ipath.path.parallel_aware = true;
+	}
+
+	/*
+	 * Now interpolate based on estimated index order correlation to get total
+	 * disk I/O cost for main table accesses.
+	 */
+	csquared = indexCorrelation * indexCorrelation;
+
+	run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
+
+	/*
+	 * Estimate CPU costs per tuple.
+	 *
+	 * What we want here is cpu_tuple_cost plus the evaluation costs of any
+	 * qual clauses that we have to evaluate as qpquals.
+	 */
+	cost_qual_eval(&qpqual_cost, qpquals, root);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+
+	cpu_run_cost += cpu_per_tuple * tuples_fetched;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->ipath.path.pathtarget->cost.startup;
+	cpu_run_cost += path->ipath.path.pathtarget->cost.per_tuple * path->ipath.path.rows;
+
+	/* Adjust costing for parallelism, if used. */
+	if (path->ipath.path.parallel_workers > 0)
+	{
+		double		parallel_divisor = get_parallel_divisor(&path->ipath.path);
+
+		path->ipath.path.rows = clamp_row_est(path->ipath.path.rows / parallel_divisor);
+
+		/* The CPU cost is divided among all the workers. */
+		cpu_run_cost /= parallel_divisor;
+	}
+
+	run_cost += cpu_run_cost;
+
+	path->ipath.path.startup_cost = startup_cost;
+	path->ipath.path.total_cost = startup_cost + run_cost;
+}
+
 /*
  * extract_nonindex_conditions
  *
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 721a0752018..132718fb736 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -17,12 +17,16 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
+#include "access/relation.h"
 #include "access/stratnum.h"
 #include "access/sysattr.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
@@ -32,10 +36,13 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/rel.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
 
+bool		enable_brinsort = true;
+
 /* XXX see PartCollMatchesExprColl */
 #define IndexCollMatchesExprColl(idxcollation, exprcollation) \
 	((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
@@ -1103,6 +1110,182 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	/*
+	 * If this is a BRIN index with suitable opclass (minmax or such), we may
+	 * try doing BRIN sort. BRIN indexes are not ordered and amcanorderbyop
+	 * is set to false, so we probably will need some new opclass flag to
+	 * mark indexes that support this.
+	 */
+	if (enable_brinsort && pathkeys_possibly_useful)
+	{
+		ListCell *lc;
+		Relation rel2 = relation_open(index->indexoid, NoLock);
+		int		 idx;
+
+		/*
+		 * Try generating sorted paths for each key with the right opclass.
+		 */
+		idx = -1;
+		foreach(lc, index->indextlist)
+		{
+			TargetEntry	   *indextle = (TargetEntry *) lfirst(lc);
+			BrinSortPath   *bpath;
+			Oid				rangeproc;
+			AttrNumber		attnum;
+
+			idx++;
+			attnum = (idx + 1);
+
+
+			/* XXX ignore non-BRIN indexes */
+			if (rel2->rd_rel->relam != BRIN_AM_OID)
+				continue;
+
+			/*
+			 * XXX Ignore keys not using an opclass with the "ranges" proc.
+			 * For now we only do this for some minmax opclasses, but adding
+			 * it to all minmax is simple, and adding it to minmax-multi
+			 * should not be very hard.
+			 */
+			rangeproc = index_getprocid(rel2, attnum, BRIN_PROCNUM_RANGES);
+			if (!OidIsValid(rangeproc))
+				continue;
+
+			/*
+			 * XXX stuff extracted from build_index_pathkeys, except that we
+			 * only deal with a single index key (producing a single pathkey),
+			 * so we only sort on a single column. I guess we could use more
+			 * index keys and sort on more expressions? Would that mean these
+			 * keys need to be rather well correlated? In any case, it seems
+			 * rather complex to implement, so I leave it as a possible
+			 * future improvement.
+			 *
+			 * XXX This could also use the other BRIN keys (even from other
+			 * indexes) in a different way - we might use the other ranges
+			 * to quickly eliminate some of the chunks, essentially like a
+			 * bitmap, but maybe without using the bitmap. Or we might use
+			 * other indexes through bitmaps.
+			 *
+			 * XXX This fakes a number of parameters, because we don't store
+			 * the btree opclass in the index, instead we use the default
+			 * one for the key data type. And BRIN does not allow specifying
+			 *
+			 * XXX We don't add the path to result, because this function is
+			 * supposed to generate IndexPaths. Instead, we just add the path
+			 * using add_path(). We should be building this in a different
+			 * place, perhaps in create_index_paths() or so.
+			 *
+			 * XXX By building it elsewhere, we could also leverage the index
+			 * paths we've built here, particularly the bitmap index paths,
+			 * which we could use to eliminate many of the ranges.
+			 *
+			 * XXX We don't have any explicit ordering associated with the
+			 * BRIN index, e.g. we don't have ASC/DESC and NULLS FIRST/LAST.
+			 * So this is not encoded in the index, and we can satisfy all
+			 * these cases - but we need to add paths for each combination.
+			 * I wonder if there's a better way to do this.
+			 */
+
+			/* ASC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* ASC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+		}
+
+		relation_close(rel2, NoLock);
+	}
+
 	return result;
 }
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index c4e7f97f687..10ea23a501b 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -27,6 +27,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/lsyscache.h"
+#include "utils/typcache.h"
 
 
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
@@ -622,6 +623,54 @@ build_index_pathkeys(PlannerInfo *root,
 	return retval;
 }
 
+
+List *
+build_index_pathkeys_brin(PlannerInfo *root,
+						  IndexOptInfo *index,
+						  TargetEntry  *tle,
+						  int idx,
+						  bool reverse_sort,
+						  bool nulls_first)
+{
+	TypeCacheEntry *typcache;
+	PathKey		   *cpathkey;
+	Oid				sortopfamily;
+
+	/*
+	 * Get default btree opfamily for the type, extracted from the
+	 * entry in index targetlist.
+	 *
+	 * XXX Is there a better / more correct way to do this?
+	 */
+	typcache = lookup_type_cache(exprType((Node *) tle->expr),
+								 TYPECACHE_BTREE_OPFAMILY);
+	sortopfamily = typcache->btree_opf;
+
+	/*
+	 * OK, try to make a canonical pathkey for this sort key.  Note we're
+	 * underneath any outer joins, so nullable_relids should be NULL.
+	 */
+	cpathkey = make_pathkey_from_sortinfo(root,
+										  tle->expr,
+										  sortopfamily,
+										  index->opcintype[idx],
+										  index->indexcollations[idx],
+										  reverse_sort,
+										  nulls_first,
+										  0,
+										  index->rel->relids,
+										  false);
+
+	/*
+	 * There may be no pathkey if we haven't matched any sortkey, in which
+	 * case ignore it.
+	 */
+	if (!cpathkey)
+		return NIL;
+
+	return list_make1(cpathkey);
+}
+
 /*
  * partkey_is_bool_constant_for_query
  *
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 134130476e4..2a7be2c4891 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -18,6 +18,7 @@
 
 #include <math.h>
 
+#include "access/genam.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
 #include "foreign/fdwapi.h"
@@ -41,6 +42,7 @@
 #include "parser/parsetree.h"
 #include "partitioning/partprune.h"
 #include "utils/lsyscache.h"
+#include "utils/rel.h"
 
 
 /*
@@ -124,6 +126,8 @@ static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
 										  List *tlist, List *scan_clauses);
 static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
 								   List *tlist, List *scan_clauses, bool indexonly);
+static BrinSort *create_brinsort_plan(PlannerInfo *root, BrinSortPath *best_path,
+									  List *tlist, List *scan_clauses);
 static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
 											   BitmapHeapPath *best_path,
 											   List *tlist, List *scan_clauses);
@@ -191,6 +195,9 @@ static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 List *indexorderby,
 										 List *indextlist,
 										 ScanDirection indexscandir);
+static BrinSort *make_brinsort(List *qptlist, List *qpqual, Index scanrelid,
+							   Oid indexid, List *indexqual, List *indexqualorig,
+							   ScanDirection indexscandir);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -410,6 +417,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_CustomScan:
 			plan = create_scan_plan(root, best_path, flags);
 			break;
+		case T_BrinSort:
+			plan = create_scan_plan(root, best_path, flags);
+			break;
 		case T_HashJoin:
 		case T_MergeJoin:
 		case T_NestLoop:
@@ -776,6 +786,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												   scan_clauses);
 			break;
 
+		case T_BrinSort:
+			plan = (Plan *) create_brinsort_plan(root,
+												 (BrinSortPath *) best_path,
+												 tlist,
+												 scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -3185,6 +3202,223 @@ create_indexscan_plan(PlannerInfo *root,
 	return scan_plan;
 }
 
+/*
+ * create_brinsort_plan
+ *	  Returns a brinsort plan for the base relation scanned by 'best_path'
+ *	  with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ *
+ * This is mostly a slighly simplified version of create_indexscan_plan, with
+ * the unecessary parts removed (we don't support indexonly scans, or reordering
+ * and similar stuff).
+ */
+static BrinSort *
+create_brinsort_plan(PlannerInfo *root,
+					 BrinSortPath *best_path,
+					 List *tlist,
+					 List *scan_clauses)
+{
+	BrinSort   *brinsort_plan;
+	List	   *indexclauses = best_path->ipath.indexclauses;
+	Index		baserelid = best_path->ipath.path.parent->relid;
+	IndexOptInfo *indexinfo = best_path->ipath.indexinfo;
+	Oid			indexoid = indexinfo->indexoid;
+	Relation	indexRel;
+	List	   *indexprs;
+	List	   *qpqual;
+	List	   *stripped_indexquals;
+	List	   *fixed_indexquals;
+	ListCell   *l;
+
+	List	   *pathkeys = best_path->ipath.path.pathkeys;
+
+	/* it should be a base rel... */
+	Assert(baserelid > 0);
+	Assert(best_path->ipath.path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * Extract the index qual expressions (stripped of RestrictInfos) from the
+	 * IndexClauses list, and prepare a copy with index Vars substituted for
+	 * table Vars.  (This step also does replace_nestloop_params on the
+	 * fixed_indexquals.)
+	 */
+	fix_indexqual_references(root, &best_path->ipath,
+							 &stripped_indexquals,
+							 &fixed_indexquals);
+
+	/*
+	 * The qpqual list must contain all restrictions not automatically handled
+	 * by the index, other than pseudoconstant clauses which will be handled
+	 * by a separate gating plan node.  All the predicates in the indexquals
+	 * will be checked (either by the index itself, or by nodeIndexscan.c),
+	 * but if there are any "special" operators involved then they must be
+	 * included in qpqual.  The upshot is that qpqual must contain
+	 * scan_clauses minus whatever appears in indexquals.
+	 *
+	 * is_redundant_with_indexclauses() detects cases where a scan clause is
+	 * present in the indexclauses list or is generated from the same
+	 * EquivalenceClass as some indexclause, and is therefore redundant with
+	 * it, though not equal.  (The latter happens when indxpath.c prefers a
+	 * different derived equality than what generate_join_implied_equalities
+	 * picked for a parameterized scan's ppi_clauses.)  Note that it will not
+	 * match to lossy index clauses, which is critical because we have to
+	 * include the original clause in qpqual in that case.
+	 *
+	 * In some situations (particularly with OR'd index conditions) we may
+	 * have scan_clauses that are not equal to, but are logically implied by,
+	 * the index quals; so we also try a predicate_implied_by() check to see
+	 * if we can discard quals that way.  (predicate_implied_by assumes its
+	 * first input contains only immutable functions, so we have to check
+	 * that.)
+	 *
+	 * Note: if you change this bit of code you should also look at
+	 * extract_nonindex_conditions() in costsize.c.
+	 */
+	qpqual = NIL;
+	foreach(l, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (rinfo->pseudoconstant)
+			continue;			/* we may drop pseudoconstants here */
+		if (is_redundant_with_indexclauses(rinfo, indexclauses))
+			continue;			/* dup or derived from same EquivalenceClass */
+		if (!contain_mutable_functions((Node *) rinfo->clause) &&
+			predicate_implied_by(list_make1(rinfo->clause), stripped_indexquals,
+								 false))
+			continue;			/* provably implied by indexquals */
+		qpqual = lappend(qpqual, rinfo);
+	}
+
+	/* Sort clauses into best execution order */
+	qpqual = order_qual_clauses(root, qpqual);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	qpqual = extract_actual_clauses(qpqual, false);
+
+	/*
+	 * We have to replace any outer-relation variables with nestloop params in
+	 * the indexqualorig, qpqual, and indexorderbyorig expressions.  A bit
+	 * annoying to have to do this separately from the processing in
+	 * fix_indexqual_references --- rethink this when generalizing the inner
+	 * indexscan support.  But note we can't really do this earlier because
+	 * it'd break the comparisons to predicates above ... (or would it?  Those
+	 * wouldn't have outer refs)
+	 */
+	if (best_path->ipath.path.param_info)
+	{
+		stripped_indexquals = (List *)
+			replace_nestloop_params(root, (Node *) stripped_indexquals);
+		qpqual = (List *)
+			replace_nestloop_params(root, (Node *) qpqual);
+	}
+
+	/* Finally ready to build the plan node */
+	brinsort_plan = make_brinsort(tlist,
+								  qpqual,
+								  baserelid,
+								  indexoid,
+								  fixed_indexquals,
+								  stripped_indexquals,
+								  best_path->ipath.indexscandir);
+
+	Assert(list_length(pathkeys) == 1);
+
+	if (pathkeys != NIL)
+	{
+		/*
+		 * Compute sort column info, and adjust the Append's tlist as needed.
+		 * Because we pass adjust_tlist_in_place = true, we may ignore the
+		 * function result; it must be the same plan node.  However, we then
+		 * need to detect whether any tlist entries were added.
+		 */
+		(void) prepare_sort_from_pathkeys((Plan *) brinsort_plan, pathkeys,
+										  best_path->ipath.path.parent->relids,
+										  NULL,
+										  true,
+										  &brinsort_plan->numCols,
+										  &brinsort_plan->sortColIdx,
+										  &brinsort_plan->sortOperators,
+										  &brinsort_plan->collations,
+										  &brinsort_plan->nullsFirst);
+		//tlist_was_changed = (orig_tlist_length != list_length(plan->plan.targetlist));
+		for (int i = 0; i < brinsort_plan->numCols; i++)
+			elog(DEBUG1, "%d => %d %d %d %d", i,
+				 brinsort_plan->sortColIdx[i],
+				 brinsort_plan->sortOperators[i],
+				 brinsort_plan->collations[i],
+				 brinsort_plan->nullsFirst[i]);
+	}
+
+	copy_generic_path_info(&brinsort_plan->scan.plan, &best_path->ipath.path);
+
+	/*
+	 * Now lookup the index attnums for sort expressions.
+	 *
+	 * Determine index attnum we're interested in. sortColIdx is an index into
+	 * the target list, so we need to grab the expression and try to match it
+	 * to the index. The expression may be either plain Var (in which case we
+	 * match it to indkeys value), or an expression (in which case we match it
+	 * to indexprs).
+	 *
+	 * XXX We've already matched the sort key to the index, otherwise we would
+	 * not get here. So maybe we could just remember it, somehow? Also, we must
+	 * keep the decisions made in these two places consistent - if we fail to
+	 * match a sort key here (which we matched before), we have a problem.
+	 *
+	 * FIXME lock mode for index_open
+	 */
+	indexRel = index_open(indexoid, NoLock);
+	indexprs = RelationGetIndexExpressions(indexRel);
+
+	brinsort_plan->attnums
+		= (AttrNumber *) palloc0(sizeof(AttrNumber) * brinsort_plan->numCols);
+
+	for (int i = 0; i < brinsort_plan->numCols; i++)
+	{
+		TargetEntry *tle;
+		int			expridx = 0;	/* expression index */
+
+		tle = list_nth(brinsort_plan->scan.plan.targetlist,
+					   brinsort_plan->sortColIdx[i] - 1);	/* FIXME proper colidx */
+
+		/* find the index key matching the expression from the target entry */
+		for (int j = 0; j < indexRel->rd_index->indnatts; j++)
+		{
+			AttrNumber indkey = indexRel->rd_index->indkey.values[j];
+
+			if (AttributeNumberIsValid(indkey))
+			{
+				Var *var = (Var *) tle->expr;
+
+				if (!IsA(tle->expr, Var))
+					continue;
+
+				if (var->varattno == indkey)
+				{
+					brinsort_plan->attnums[i] = (j + 1);
+					break;
+				}
+			}
+			else
+			{
+				Node *expr = (Node *) list_nth(indexprs, expridx);
+
+				if (equal(expr, tle->expr))
+				{
+					brinsort_plan->attnums[i] = (j + 1);
+					break;
+				}
+
+				expridx++;
+			}
+		}
+	}
+
+	index_close(indexRel, NoLock);
+
+	return brinsort_plan;
+}
+
 /*
  * create_bitmap_scan_plan
  *	  Returns a bitmap scan plan for the base relation scanned by 'best_path'
@@ -5539,6 +5773,31 @@ make_indexscan(List *qptlist,
 	return node;
 }
 
+static BrinSort *
+make_brinsort(List *qptlist,
+			   List *qpqual,
+			   Index scanrelid,
+			   Oid indexid,
+			   List *indexqual,
+			   List *indexqualorig,
+			   ScanDirection indexscandir)
+{
+	BrinSort  *node = makeNode(BrinSort);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->indexid = indexid;
+	node->indexqual = indexqual;
+	node->indexqualorig = indexqualorig;
+	node->indexorderdir = indexscandir;
+
+	return node;
+}
+
 static IndexOnlyScan *
 make_indexonlyscan(List *qptlist,
 				   List *qpqual,
@@ -7175,6 +7434,7 @@ is_projection_capable_path(Path *path)
 		case T_Memoize:
 		case T_Sort:
 		case T_IncrementalSort:
+		case T_BrinSort:
 		case T_Unique:
 		case T_SetOp:
 		case T_LockRows:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5cc8366af66..ce7cbce02a3 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -708,6 +708,25 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 				return set_indexonlyscan_references(root, splan, rtoffset);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *splan = (BrinSort *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->indexqual =
+					fix_scan_list(root, splan->indexqual,
+								  rtoffset, 1);
+				splan->indexqualorig =
+					fix_scan_list(root, splan->indexqualorig,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *splan = (BitmapIndexScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d749b505785..478d56234a4 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1028,6 +1028,63 @@ create_index_path(PlannerInfo *root,
 	return pathnode;
 }
 
+
+/*
+ * create_brinsort_path
+ *	  Creates a path node for sorted brin sort scan.
+ *
+ * 'index' is a usable index.
+ * 'indexclauses' is a list of IndexClause nodes representing clauses
+ *			to be enforced as qual conditions in the scan.
+ * 'indexorderbys' is a list of bare expressions (no RestrictInfos)
+ *			to be used as index ordering operators in the scan.
+ * 'indexorderbycols' is an integer list of index column numbers (zero based)
+ *			the ordering operators can be used with.
+ * 'pathkeys' describes the ordering of the path.
+ * 'indexscandir' is ForwardScanDirection or BackwardScanDirection
+ *			for an ordered index, or NoMovementScanDirection for
+ *			an unordered index.
+ * 'indexonly' is true if an index-only scan is wanted.
+ * 'required_outer' is the set of outer relids for a parameterized path.
+ * 'loop_count' is the number of repetitions of the indexscan to factor into
+ *		estimates of caching behavior.
+ * 'partial_path' is true if constructing a parallel index scan path.
+ *
+ * Returns the new path node.
+ */
+BrinSortPath *
+create_brinsort_path(PlannerInfo *root,
+					 IndexOptInfo *index,
+					 List *indexclauses,
+					 List *pathkeys,
+					 ScanDirection indexscandir,
+					 bool indexonly,
+					 Relids required_outer,
+					 double loop_count,
+					 bool partial_path)
+{
+	BrinSortPath  *pathnode = makeNode(BrinSortPath);
+	RelOptInfo *rel = index->rel;
+
+	pathnode->ipath.path.pathtype = T_BrinSort;
+	pathnode->ipath.path.parent = rel;
+	pathnode->ipath.path.pathtarget = rel->reltarget;
+	pathnode->ipath.path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->ipath.path.parallel_aware = false;
+	pathnode->ipath.path.parallel_safe = rel->consider_parallel;
+	pathnode->ipath.path.parallel_workers = 0;
+	pathnode->ipath.path.pathkeys = pathkeys;
+
+	pathnode->ipath.indexinfo = index;
+	pathnode->ipath.indexclauses = indexclauses;
+	pathnode->ipath.indexscandir = indexscandir;
+
+	cost_brinsort(pathnode, root, loop_count, partial_path);
+
+	return pathnode;
+}
+
 /*
  * create_bitmap_heap_path
  *	  Creates a path node for a bitmap scan.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1d576343ecd..775d220fce9 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -101,6 +101,10 @@ extern bool debug_brin_stats;
 extern bool debug_brin_cross_check;
 #endif
 
+#ifdef DEBUG_BRIN_SORT
+extern bool debug_brin_sort;
+#endif
+
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
 #endif
@@ -1017,6 +1021,16 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_brinsort", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of BRIN sort plans."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_brinsort,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
@@ -1258,6 +1272,20 @@ struct config_bool ConfigureNamesBool[] =
 	},
 #endif
 
+#ifdef DEBUG_BRIN_SORT
+	/* this is undocumented because not exposed in a standard build */
+	{
+		{"debug_brin_sort", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Print info about BRIN sorting."),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&debug_brin_sort,
+		false,
+		NULL, NULL, NULL
+	},
+#endif
+
 	{
 		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Terminate session on any error."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 47e80ad150c..10b97e96bc8 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -371,6 +371,7 @@
 
 #enable_async_append = on
 #enable_bitmapscan = on
+#enable_brinsort = off
 #enable_gathermerge = on
 #enable_hashagg = on
 #enable_hashjoin = on
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 9ca9835aab6..750ee016fd7 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2574,6 +2574,18 @@ tuplesort_get_stats(Tuplesortstate *state,
 	}
 }
 
+/*
+ * tuplesort_reset_stats - reset summary statistics
+ *
+ * This can be called before tuplesort_performsort() starts.
+ */
+void
+tuplesort_reset_stats(Tuplesortstate *state)
+{
+	state->isMaxSpaceDisk = false;
+	state->maxSpace = 0;
+}
+
 /*
  * Convert TuplesortMethod to a string.
  */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index 1d21b816fcd..cdfa7421ae9 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,41 +34,6 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
-/*
- * Info about ranges for BRIN Sort.
- */
-typedef struct BrinRange
-{
-	BlockNumber blkno_start;
-	BlockNumber blkno_end;
-
-	Datum	min_value;
-	Datum	max_value;
-	bool	has_nulls;
-	bool	all_nulls;
-	bool	not_summarized;
-
-	/*
-	 * Index of the range when ordered by min_value (if there are multiple
-	 * ranges with the same min_value, it's the lowest one).
-	 */
-	uint32	min_index;
-
-	/*
-	 * Minimum min_index from all ranges with higher max_value (i.e. when
-	 * sorted by max_value). If there are multiple ranges with the same
-	 * max_value, it depends on the ordering (i.e. the ranges may get
-	 * different min_index_lowest, depending on the exact ordering).
-	 */
-	uint32	min_index_lowest;
-} BrinRange;
-
-typedef struct BrinRanges
-{
-	int			nranges;
-	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
-} BrinRanges;
-
 typedef struct BrinMinmaxStats
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index eac796e6f47..6dd3f3e3c11 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -76,6 +76,7 @@ typedef struct BrinDesc
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
 #define BRIN_PROCNUM_STATISTICS		11	/* optional */
+#define BRIN_PROCNUM_RANGES 		12	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 9bbd1f14f12..b499699b350 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -806,6 +806,8 @@
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -839,6 +841,8 @@
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -870,6 +874,8 @@
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -901,6 +907,8 @@
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -915,6 +923,8 @@
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -929,6 +939,8 @@
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1048,6 +1060,8 @@
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1078,6 +1092,8 @@
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1128,6 +1144,8 @@
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1181,6 +1199,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1197,6 +1218,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1288,6 +1312,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1344,6 +1371,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1398,6 +1428,8 @@
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1471,6 +1503,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1504,6 +1539,8 @@
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1557,6 +1594,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1573,6 +1613,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1587,6 +1630,8 @@
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1716,6 +1761,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1772,6 +1820,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1824,6 +1875,8 @@
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1841,6 +1894,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1858,6 +1914,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1912,6 +1971,8 @@
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1988,6 +2049,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c44784a0d07..84ec3259be9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5309,7 +5309,7 @@
   proname => 'pg_stat_get_numscans', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_numscans' },
-{ oid => '9976', descr => 'statistics: time of the last scan for table/index',
+{ oid => '8912', descr => 'statistics: time of the last scan for table/index',
   proname => 'pg_stat_get_lastscan', provolatile => 's', proparallel => 'r',
   prorettype => 'timestamptz', proargtypes => 'oid',
   prosrc => 'pg_stat_get_lastscan' },
@@ -8500,6 +8500,9 @@
   proname => 'brin_minmax_stats', prorettype => 'bool',
   proargtypes => 'internal internal int2 int2 internal int4',
   prosrc => 'brin_minmax_stats' },
+{ oid => '9801', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_ranges', prorettype => 'bool',
+  proargtypes => 'internal int2 bool', prosrc => 'brin_minmax_ranges' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/executor/nodeBrinSort.h b/src/include/executor/nodeBrinSort.h
new file mode 100644
index 00000000000..3cac599d811
--- /dev/null
+++ b/src/include/executor/nodeBrinSort.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeBrinSort.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODEBRIN_SORT_H
+#define NODEBRIN_SORT_H
+
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "nodes/execnodes.h"
+
+extern BrinSortState *ExecInitBrinSort(BrinSort *node, EState *estate, int eflags);
+extern void ExecEndBrinSort(BrinSortState *node);
+extern void ExecBrinSortMarkPos(BrinSortState *node);
+extern void ExecBrinSortRestrPos(BrinSortState *node);
+extern void ExecReScanBrinSort(BrinSortState *node);
+extern void ExecBrinSortEstimate(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortReInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeWorker(BrinSortState *node,
+										  ParallelWorkerContext *pwcxt);
+
+/*
+ * These routines are exported to share code with nodeIndexonlyscan.c and
+ * nodeBitmapBrinSort.c
+ */
+extern void ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
+								   List *quals, bool isorderby,
+								   ScanKey *scanKeys, int *numScanKeys,
+								   IndexRuntimeKeyInfo **runtimeKeys, int *numRuntimeKeys,
+								   IndexArrayKeyInfo **arrayKeys, int *numArrayKeys);
+extern void ExecIndexEvalRuntimeKeys(ExprContext *econtext,
+									 IndexRuntimeKeyInfo *runtimeKeys, int numRuntimeKeys);
+extern bool ExecIndexEvalArrayKeys(ExprContext *econtext,
+								   IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+extern bool ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+
+#endif							/* NODEBRIN_SORT_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 20f4c8b35f3..efe26938d0a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1565,6 +1565,114 @@ typedef struct IndexScanState
 	Size		iss_PscanLen;
 } IndexScanState;
 
+typedef enum {
+	BRINSORT_START,
+	BRINSORT_LOAD_RANGE,
+	BRINSORT_PROCESS_RANGE,
+	BRINSORT_LOAD_NULLS,
+	BRINSORT_PROCESS_NULLS,
+	BRINSORT_FINISHED
+} BrinSortPhase;
+
+typedef struct BrinRangeScanDesc
+{
+	/* range info tuple descriptor */
+	TupleDesc		tdesc;
+
+	/* ranges, sorted by minval, blkno_start */
+	Tuplesortstate *ranges;
+
+	/* number of ranges in the tuplesort */
+	int64			nranges;
+
+	/* distinct minval (sorted) */
+	Tuplestorestate *minvals;
+
+	/* slot for accessing the tuplesort/tuplestore */
+	TupleTableSlot  *slot;
+
+} BrinRangeScanDesc;
+
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinSortState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	ExprState  *indexqualorig;
+	List	   *indexorderbyorig;
+	struct ScanKeyData *iss_ScanKeys;
+	int			iss_NumScanKeys;
+	struct ScanKeyData *iss_OrderByKeys;
+	int			iss_NumOrderByKeys;
+	IndexRuntimeKeyInfo *iss_RuntimeKeys;
+	int			iss_NumRuntimeKeys;
+	bool		iss_RuntimeKeysReady;
+	ExprContext *iss_RuntimeContext;
+	Relation	iss_RelationDesc;
+	struct IndexScanDescData *iss_ScanDesc;
+
+	/* These are needed for re-checking ORDER BY expr ordering */
+	pairingheap *iss_ReorderQueue;
+	bool		iss_ReachedEnd;
+	Datum	   *iss_OrderByValues;
+	bool	   *iss_OrderByNulls;
+	SortSupport iss_SortSupport;
+	bool	   *iss_OrderByTypByVals;
+	int16	   *iss_OrderByTypLens;
+	Size		iss_PscanLen;
+
+	/* */
+	BrinRangeScanDesc *bs_scan;
+	BrinRange	   *bs_range;
+	ExprState	   *bs_qual;
+	Datum			bs_watermark;
+	bool			bs_watermark_set;
+	bool			bs_watermark_empty;
+	BrinSortPhase	bs_phase;
+	SortSupportData	bs_sortsupport;
+	ProjectionInfo *bs_ProjInfo;
+
+	/*
+	 * We need two tuplesort instances - one for current range, one for
+	 * spill-over tuples from the overlapping ranges
+	 */
+	void		   *bs_tuplesortstate;
+	Tuplestorestate *bs_tuplestore;
+} BrinSortState;
+
 /* ----------------
  *	 IndexOnlyScanState information
  *
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d61a62da196..3d93b2ac714 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1690,6 +1690,17 @@ typedef struct IndexPath
 	Selectivity indexselectivity;
 } IndexPath;
 
+/*
+ * read sorted data from brin index
+ *
+ * We use IndexPath, because that's what amcostestimate is expecting, but
+ * we typedef it as a separate struct.
+ */
+typedef struct BrinSortPath
+{
+	IndexPath	ipath;
+} BrinSortPath;
+
 /*
  * Each IndexClause references a RestrictInfo node from the query's WHERE
  * or JOIN conditions, and shows how that restriction can be applied to
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 659bd05c0c1..341dfc57826 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -501,6 +501,38 @@ typedef struct IndexOnlyScan
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
 } IndexOnlyScan;
 
+/*
+ * XXX Does it make sense (is it possible) to have a sort by more than one
+ * column, using a BRIN index?
+ */
+typedef struct BrinSort
+{
+	Scan		scan;
+	Oid			indexid;		/* OID of index to scan */
+	List	   *indexqual;		/* list of index quals (usually OpExprs) */
+	List	   *indexqualorig;	/* the same in original form */
+	ScanDirection indexorderdir;	/* forward or backward or don't care */
+
+	/* number of sort-key columns */
+	int			numCols;
+
+	/* attnums in the index */
+	AttrNumber *attnums pg_node_attr(array_size(numCols));
+
+	/* their indexes in the target list */
+	AttrNumber *sortColIdx pg_node_attr(array_size(numCols));
+
+	/* OIDs of operators to sort them by */
+	Oid		   *sortOperators pg_node_attr(array_size(numCols));
+
+	/* OIDs of collations */
+	Oid		   *collations pg_node_attr(array_size(numCols));
+
+	/* NULLS FIRST/LAST directions */
+	bool	   *nullsFirst pg_node_attr(array_size(numCols));
+
+} BrinSort;
+
 /* ----------------
  *		bitmap index scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6cf49705d3a..1fed43645a7 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -70,6 +70,7 @@ extern PGDLLIMPORT bool enable_parallel_hash;
 extern PGDLLIMPORT bool enable_partition_pruning;
 extern PGDLLIMPORT bool enable_presorted_aggregate;
 extern PGDLLIMPORT bool enable_async_append;
+extern PGDLLIMPORT bool enable_brinsort;
 extern PGDLLIMPORT int constraint_exclusion;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
@@ -80,6 +81,8 @@ extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 							ParamPathInfo *param_info);
 extern void cost_index(IndexPath *path, PlannerInfo *root,
 					   double loop_count, bool partial_path);
+extern void cost_brinsort(BrinSortPath *path, PlannerInfo *root,
+						  double loop_count, bool partial_path);
 extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 								  ParamPathInfo *param_info,
 								  Path *bitmapqual, double loop_count);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 69be701b167..03ecc988001 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -49,6 +49,15 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									Relids required_outer,
 									double loop_count,
 									bool partial_path);
+extern BrinSortPath *create_brinsort_path(PlannerInfo *root,
+									IndexOptInfo *index,
+									List *indexclauses,
+									List *pathkeys,
+									ScanDirection indexscandir,
+									bool indexonly,
+									Relids required_outer,
+									double loop_count,
+									bool partial_path);
 extern BitmapHeapPath *create_bitmap_heap_path(PlannerInfo *root,
 											   RelOptInfo *rel,
 											   Path *bitmapqual,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 736d78ea4c3..3e1c9457629 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -212,6 +212,9 @@ extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,
 extern Path *get_cheapest_parallel_safe_total_inner(List *paths);
 extern List *build_index_pathkeys(PlannerInfo *root, IndexOptInfo *index,
 								  ScanDirection scandir);
+extern List *build_index_pathkeys_brin(PlannerInfo *root, IndexOptInfo *index,
+								  TargetEntry *tle, int idx,
+								  bool reverse_sort, bool nulls_first);
 extern List *build_partition_pathkeys(PlannerInfo *root, RelOptInfo *partrel,
 									  ScanDirection scandir, bool *partialkeys);
 extern List *build_expression_pathkey(PlannerInfo *root, Expr *expr,
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 12578e42bc3..45413dac1a5 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -367,6 +367,7 @@ extern void tuplesort_reset(Tuplesortstate *state);
 
 extern void tuplesort_get_stats(Tuplesortstate *state,
 								TuplesortInstrumentation *stats);
+extern void tuplesort_reset_stats(Tuplesortstate *state);
 extern const char *tuplesort_method_name(TuplesortMethod m);
 extern const char *tuplesort_space_type_name(TuplesortSpaceType t);
 
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index b7fda6fc828..308e912c21c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
 --------------------------------+---------
  enable_async_append            | on
  enable_bitmapscan              | on
+ enable_brinsort                | on
  enable_gathermerge             | on
  enable_hashagg                 | on
  enable_hashjoin                | on
@@ -133,7 +134,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(22 rows)
+(23 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.39.1

0005-wip-brinsort-explain-stats-20230218.patchtext/x-patch; charset=UTF-8; name=0005-wip-brinsort-explain-stats-20230218.patchDownload

From 8616c7de129c303d7bb81c77414bf646db4d67b8 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 22:01:01 +0200
Subject: [PATCH 05/10] wip: brinsort explain stats

Show some internal stats about BRIN Sort in EXPLAIN output.
---
 src/backend/commands/explain.c      | 132 ++++++++++++++++++++++++++++
 src/backend/executor/nodeBrinSort.c |  66 +++++++++++---
 src/include/nodes/execnodes.h       |  39 ++++++++
 3 files changed, 223 insertions(+), 14 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 153e41b856f..91dcf7d6660 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -87,6 +87,8 @@ static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
 static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
 							   ExplainState *es);
+static void show_brinsort_stats(BrinSortState *sortstate, List *ancestors,
+								ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1814,6 +1816,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			show_brinsort_stats(castNode(BrinSortState, planstate), ancestors, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
@@ -2432,6 +2435,135 @@ show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 						 ancestors, es);
 }
 
+static void
+show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSortStats  *stats = &sortstate->bs_stats;
+
+	if (sortstate->bs_scan != NULL &&
+		sortstate->bs_scan->ranges != NULL)
+	{
+		TuplesortInstrumentation stats;
+
+		memset(&stats, 0, sizeof(TuplesortInstrumentation));
+		tuplesort_get_stats(sortstate->bs_scan->ranges, &stats);
+
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Ranges: " INT64_FORMAT "  Build time: " INT64_FORMAT "  Method: %s  Space: %ld kB (%s)\n",
+						 sortstate->bs_scan->nranges,
+						 sortstate->bs_stats.ranges_build_ms,
+						 tuplesort_method_name(stats.sortMethod),
+						 stats.spaceUsed,
+						 tuplesort_space_type_name(stats.spaceType));
+	}
+
+	if (stats->sort_count > 0)
+	{
+		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
+							   stats->range_count, es);
+
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainPropertyInteger("Sorts", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Tuples Sorted: " INT64_FORMAT "  Per-sort: " INT64_FORMAT  "  Direct: " INT64_FORMAT "  Spilled: " INT64_FORMAT "  Respilled: " INT64_FORMAT "\n",
+							 stats->ntuples_tuplesort_all,
+							 stats->ntuples_tuplesort_all / stats->sort_count,
+							 stats->ntuples_tuplesort_direct,
+							 stats->ntuples_spilled,
+							 stats->ntuples_respilled);
+		}
+		else
+		{
+			ExplainOpenGroup("Sorts", "Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainPropertyInteger("Tuples per sort", NULL, (int64)
+								   stats->ntuples_tuplesort_all / stats->sort_count, es);
+
+			ExplainPropertyInteger("Sorted tuples (all)", NULL, (int64)
+								   stats->ntuples_tuplesort_all, es);
+
+			ExplainPropertyInteger("Sorted tuples (direct)", NULL, (int64)
+								   stats->ntuples_tuplesort_direct, es);
+
+			ExplainPropertyInteger("Spilled tuples", NULL, (int64)
+								   stats->ntuples_spilled, es);
+
+			ExplainPropertyInteger("Respilled tuples", NULL, (int64)
+								   stats->ntuples_respilled, es);
+
+			ExplainCloseGroup("Sorts", "Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_in_memory > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (in-memory)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_in_memory,
+							 stats->total_space_used_in_memory,
+							 stats->max_space_used_in_memory,
+							 stats->total_space_used_in_memory / stats->sort_count_in_memory);
+		}
+		else
+		{
+			ExplainOpenGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_in_memory / stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_in_memory, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_in_memory, es);
+
+			ExplainCloseGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_on_disk > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (on-disk)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_on_disk,
+							 stats->total_space_used_on_disk,
+							 stats->max_space_used_on_disk,
+							 stats->total_space_used_on_disk / stats->sort_count_on_disk);
+		}
+		else
+		{
+			ExplainOpenGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_on_disk / stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_on_disk, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_on_disk, es);
+
+			ExplainCloseGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+		}
+	}
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 9505eafc548..614d28c83b1 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -450,6 +450,8 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
 		return;
 
+	node->bs_stats.range_count++;
+
 	brinsort_start_tidscan(node);
 
 	scan = node->ss.ss_currentScanDesc;
@@ -534,7 +536,10 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 				/* Stash it to the tuplestore (when NULL, or ignore
 				 * it (when not-NULL). */
 				if (isnull)
+				{
 					tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+					node->bs_stats.ntuples_spilled++;
+				}
 
 				/* NULL or not, we're done */
 				continue;
@@ -554,7 +559,12 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 										  &node->bs_sortsupport);
 
 			if (cmp <= 0)
+			{
 				tuplesort_puttupleslot(node->bs_tuplesortstate, tmpslot);
+				node->bs_stats.ntuples_tuplesort_direct++;
+				node->bs_stats.ntuples_tuplesort_all++;
+				node->bs_stats.ntuples_tuplesort++;
+			}
 			else
 			{
 				/*
@@ -565,6 +575,7 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 				 * respill) and stop spilling.
 				 */
 				tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+				node->bs_stats.ntuples_spilled++;
 			}
 		}
 
@@ -633,7 +644,11 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 									  &node->bs_sortsupport);
 
 		if (cmp <= 0)
+		{
 			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+			node->bs_stats.ntuples_tuplesort_all++;
+			node->bs_stats.ntuples_tuplesort++;
+		}
 		else
 		{
 			/*
@@ -644,6 +659,7 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 			 * respill) and stop spilling.
 			 */
 			tuplestore_puttupleslot(tupstore, slot);
+			node->bs_stats.ntuples_respilled++;
 		}
 	}
 
@@ -933,23 +949,40 @@ IndexNext(BrinSortState *node)
 					 */
 					if (node->bs_tuplesortstate)
 					{
-#ifdef DEBUG_BRIN_SORT
+						TuplesortInstrumentation stats;
+
+						/*
+						 * Reset tuplesort statistics between runs, otherwise
+						 * we'll keep re-using stats from the largest run.
+						 */
 						tuplesort_reset_stats(node->bs_tuplesortstate);
-#endif
 
 						tuplesort_performsort(node->bs_tuplesortstate);
 
-#ifdef DEBUG_BRIN_SORT
-						if (debug_brin_sort)
-						{
-							TuplesortInstrumentation stats;
+						node->bs_stats.sort_count++;
+						node->bs_stats.ntuples_tuplesort = 0;
 
-							memset(&stats, 0, sizeof(TuplesortInstrumentation));
-							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
 
-							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+						if (stats.spaceType == SORT_SPACE_TYPE_DISK)
+						{
+							node->bs_stats.sort_count_on_disk++;
+							node->bs_stats.total_space_used_on_disk += stats.spaceUsed;
+							node->bs_stats.max_space_used_on_disk = Max(node->bs_stats.max_space_used_on_disk,
+																		stats.spaceUsed);
+						}
+						else if (stats.spaceType == SORT_SPACE_TYPE_MEMORY)
+						{
+							node->bs_stats.sort_count_in_memory++;
+							node->bs_stats.total_space_used_in_memory += stats.spaceUsed;
+							node->bs_stats.max_space_used_in_memory = Max(node->bs_stats.max_space_used_in_memory,
+																		  stats.spaceUsed);
+						}
 
-							elog(WARNING, "method: %s  space: %ld kB (%s)",
+#ifdef DEBUG_BRIN_SORT
+						if (debug_brin_sort)
+						{
+							elog(WARNING, "method: %s  space: " INT64_FORMAT " kB (%s)",
 								 tuplesort_method_name(stats.sortMethod),
 								 stats.spaceUsed,
 								 tuplesort_space_type_name(stats.spaceType));
@@ -1219,9 +1252,10 @@ ExecEndBrinSort(BrinSortState *node)
 		tuplesort_end(node->bs_tuplesortstate);
 	node->bs_tuplesortstate = NULL;
 
-	if (node->bs_scan->ranges != NULL)
+	if (node->bs_scan != NULL &&
+		node->bs_scan->ranges != NULL)
 		tuplesort_end(node->bs_scan->ranges);
-	node->bs_scan->ranges = NULL;
+	node->bs_scan = NULL;
 }
 
 /* ----------------------------------------------------------------
@@ -1311,6 +1345,7 @@ ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
 	FmgrInfo   *rangeproc;
 	BrinRangeScanDesc *brscan;
 	bool		asc;
+	TimestampTz	start_ts;
 
 	/* BRIN Sort only allows ORDER BY using a single column */
 	Assert(node->numCols == 1);
@@ -1355,15 +1390,18 @@ ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
 
 	/*
 	 * Ask the opclass to produce ranges in appropriate ordering.
-	 *
-	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
 	 */
+	start_ts = GetCurrentTimestamp();
+
 	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
 											node->collations[0],
 											PointerGetDatum(scan),
 											Int16GetDatum(attno),
 											BoolGetDatum(asc)));
 
+	planstate->bs_stats.ranges_build_ms
+		= TimestampDifferenceMilliseconds(start_ts, GetCurrentTimestamp());
+
 	/* allocate for space, and also for the alternative ordering */
 	planstate->bs_scan = brscan;
 }
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index efe26938d0a..2a98286e11a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1628,6 +1628,44 @@ typedef struct BrinRanges
 	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
 } BrinRanges;
 
+typedef struct BrinSortStats
+{
+	/* number of sorts */
+	int64	sort_count;
+
+	/* number of ranges loaded */
+	int64	range_count;
+
+	/* tuples in the current tuplesort */
+	int64	ntuples_tuplesort;
+
+	/* tuples written directly to tuplesort */
+	int64	ntuples_tuplesort_direct;
+
+	/* tuples written to tuplesort (all) */
+	int64	ntuples_tuplesort_all;
+
+	/* tuples written to tuplestore */
+	int64	ntuples_spilled;
+
+	/* tuples copied from old to new tuplestore */
+	int64	ntuples_respilled;
+
+	/* number of in-memory/on-disk sorts */
+	int64	sort_count_in_memory;
+	int64	sort_count_on_disk;
+
+	/* total/maximum amount of space used by either sort */
+	int64	total_space_used_in_memory;
+	int64	total_space_used_on_disk;
+	int64	max_space_used_in_memory;
+	int64	max_space_used_on_disk;
+
+	/* time to build ranges (milliseconds) */
+	int64	ranges_build_ms;
+
+} BrinSortStats;
+
 typedef struct BrinSortState
 {
 	ScanState	ss;				/* its first field is NodeTag */
@@ -1664,6 +1702,7 @@ typedef struct BrinSortState
 	BrinSortPhase	bs_phase;
 	SortSupportData	bs_sortsupport;
 	ProjectionInfo *bs_ProjInfo;
+	BrinSortStats	bs_stats;
 
 	/*
 	 * We need two tuplesort instances - one for current range, one for
-- 
2.39.1

0006-wip-multiple-watermark-steps-20230218.patchtext/x-patch; charset=UTF-8; name=0006-wip-multiple-watermark-steps-20230218.patchDownload

From fa2bb6edf3a9b6d5fb92240294094749612947c3 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 20 Oct 2022 13:03:00 +0200
Subject: [PATCH 06/10] wip: multiple watermark steps

Allow incrementing the minval watermark faster, by skipping some minval
values. This allows sorting more data at once (instead of many tiny
sorts, which is inefficient). This also reduces the number of rows we
need to spill (and possibly transfer multiple times).

To use a different watermark step, use a new GUC:

  SET brinsort_watermark_step = 16
---
 src/backend/commands/explain.c      |  3 ++
 src/backend/executor/nodeBrinSort.c | 59 ++++++++++++++++++++++++++---
 src/backend/utils/misc/guc_tables.c | 12 ++++++
 3 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 91dcf7d6660..d84b118ac73 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -47,6 +47,7 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
 /* Hook for plugins to get control in explain_get_index_name() */
 explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
 
+extern int brinsort_watermark_step;
 
 /* OR-able flags for ExplainXMLTag() */
 #define X_OPENING 0
@@ -2457,6 +2458,8 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 						 tuplesort_space_type_name(stats.spaceType));
 	}
 
+	ExplainPropertyInteger("Step", NULL, (int64) brinsort_watermark_step, es);
+
 	if (stats->sort_count > 0)
 	{
 		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 614d28c83b1..15d15797735 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -248,6 +248,14 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 1;
+
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -351,11 +359,24 @@ brinsort_end_tidscan(BrinSortState *node)
  * a separate "first" parameter - "set=false" has the same meaning.
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool asc)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
 {
 	int		cmp;
+
+	/* assume we haven't found a watermark */
 	bool	found = false;
 
+	Assert(steps > 0);
+
+	/*
+	 * If the watermark is empty, either this is the first call (in
+	 * which case we just use the first (or rather second) value.
+	 * Otherwise it means we've reached the end, so no point in looking
+	 * for more watermarks.
+	 */
+	if (node->bs_watermark_empty && !first)
+		return;
+
 	tuplesort_markpos(node->bs_scan->ranges);
 
 	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
@@ -381,22 +402,48 @@ brinsort_update_watermark(BrinSortState *node, bool asc)
 		else
 			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
 
+		/*
+		 * Has to be the first call (otherwise we would not get here, because we
+		 * terminate after bs_watermark_set gets flipped back to false), so we
+		 * just set the value. But we don't count this as a step, because that
+		 * just picks the first minval value, as we certainly need to do at least
+		 * one more step.
+		 *
+		 * XXX Actually, do we need to make another step? Maybe there are enough
+		 * not-summarized ranges? Although, we don't know what values are in
+		 * those, ranges, and with increasing data we might easily end up just
+		 * writing all of it into the spill tuplestore. So making one more step
+		 * seems like a better idea - we'll at least be able to produce something
+		 * which is good for LIMIT queries.
+		 */
 		if (!node->bs_watermark_set)
 		{
+			Assert(first);
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
+			found = true;
 			continue;
 		}
 
 		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
 								  &node->bs_sortsupport);
 
-		if (cmp < 0)
+		/*
+		 * Values should not decrease (or whatever the operator says, might
+		 * be a DESC sort).
+		 */
+		Assert(cmp <= 0);
+
+		if (cmp < 0)	/* new watermark value */
 		{
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
 			found = true;
-			break;
+
+			steps--;
+
+			if (steps == 0)
+				break;
 		}
 	}
 
@@ -913,7 +960,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -1034,7 +1081,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -1059,7 +1106,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, asc);
+								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 775d220fce9..c36b0175344 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -95,6 +95,7 @@ extern char *temp_tablespaces;
 extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
 extern bool synchronize_seqscans;
+extern int	brinsort_watermark_step;
 
 #ifdef DEBUG_BRIN_STATS
 extern bool debug_brin_stats;
@@ -3533,6 +3534,17 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"brinsort_watermark_step", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("sets the step for brinsort watermark increments"),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&brinsort_watermark_step,
+		1, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
-- 
2.39.1

0007-wip-adjust-watermark-step-20230218.patchtext/x-patch; charset=UTF-8; name=0007-wip-adjust-watermark-step-20230218.patchDownload

From f579befc1f70fee99faca34cbc9f196573b03aa6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 00:06:28 +0200
Subject: [PATCH 07/10] wip: adjust watermark step

Look at available statistics - number of possible watermark values,
number of rows, work_mem, etc. and pick a good watermark_step value.

To calculate step using statistics, set the GUC to 0:

   SET brinsort_watermark_step = 0;
---
 src/backend/commands/explain.c          |  6 +++
 src/backend/executor/nodeBrinSort.c     | 21 ++++----
 src/backend/optimizer/plan/createplan.c | 70 +++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c     |  2 +-
 src/include/nodes/execnodes.h           |  5 ++
 src/include/nodes/plannodes.h           |  3 ++
 6 files changed, 94 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d84b118ac73..7ed2c95dd09 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2440,6 +2440,7 @@ static void
 show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 {
 	BrinSortStats  *stats = &sortstate->bs_stats;
+	BrinSort   *plan = (BrinSort *) sortstate->ss.ps.plan;
 
 	if (sortstate->bs_scan != NULL &&
 		sortstate->bs_scan->ranges != NULL)
@@ -2462,6 +2463,9 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 
 	if (stats->sort_count > 0)
 	{
+		ExplainPropertyInteger("Average Step", NULL, (int64)
+							   stats->watermark_updates_steps / stats->watermark_updates_count, es);
+
 		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
 							   stats->range_count, es);
 
@@ -2503,6 +2507,8 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 			ExplainCloseGroup("Sorts", "Sorts", true, es);
 		}
 	}
+	else
+		ExplainPropertyInteger("Initial Step", NULL, (int64) plan->watermark_step, es);
 
 	if (stats->sort_count_in_memory > 0)
 	{
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 15d15797735..f8356202b77 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -248,14 +248,6 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
-/*
- * How many distinct minval values to look forward for the next watermark?
- *
- * The smallest step we can do is 1, which means the immediately following
- * (while distinct) minval.
- */
-int brinsort_watermark_step = 1;
-
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -359,9 +351,11 @@ brinsort_end_tidscan(BrinSortState *node)
  * a separate "first" parameter - "set=false" has the same meaning.
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc)
 {
 	int		cmp;
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	int			steps = plan->watermark_step;
 
 	/* assume we haven't found a watermark */
 	bool	found = false;
@@ -449,6 +443,9 @@ brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
 
 	tuplesort_restorepos(node->bs_scan->ranges);
 
+	node->bs_stats.watermark_updates_count++;
+	node->bs_stats.watermark_updates_steps += plan->watermark_step;
+
 	node->bs_watermark_empty = (!found);
 }
 
@@ -960,7 +957,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, true, asc);
 				}
 
 				break;
@@ -1081,7 +1078,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, false, asc);
 				}
 
 				break;
@@ -1106,7 +1103,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+								brinsort_update_watermark(node, true, asc);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 2a7be2c4891..01cfe3f5b58 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -18,6 +18,7 @@
 
 #include <math.h>
 
+#include "access/brin.h"
 #include "access/genam.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
@@ -323,6 +324,14 @@ static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
 											 GatherMergePath *best_path);
 
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 0;
+
 /*
  * create_plan
  *	  Creates the access plan for a query by recursively processing the
@@ -3416,6 +3425,67 @@ create_brinsort_plan(PlannerInfo *root,
 
 	index_close(indexRel, NoLock);
 
+	/*
+	 * determine watermark step (how fast to advance)
+	 *
+	 * If the brinsort_watermark_step is set to a non-zero value, we just use
+	 * that value directly. Otherwise we pick a value using some simple
+	 * heuristics - we don't want the rows to exceed work_mem, and we leave
+	 * a bit slack (because we're adding batches of rows, not row by row).
+	 *
+	 * This has a weakness, because it assumes we incrementally add the same
+	 * number of rows into the "sort" set - but imagine very wide overlapping
+	 * ranges (e.g. random data on the same domain). Most of them will have
+	 * about the same minval, so the sort grows only very slowly. Until the
+	 * very last range, that removes the watermark and only then do most of
+	 * the rows get to the tuplesort.
+	 *
+	 * XXX But maybe we can look at the other statistics we have, like number
+	 * of overlaps and average range selectivity (% of tuples matching), and
+	 * deduce something from that?
+	 *
+	 * XXX Could we maybe adjust the watermark step adaptively at runtime?
+	 * That is, when we get to the "sort" step, maybe check how many rows
+	 * are there, and if there are only few then try increasing the step?
+	 */
+	brinsort_plan->watermark_step = brinsort_watermark_step;
+
+	if (brinsort_plan->watermark_step == 0)
+	{
+		BrinMinmaxStats *amstats;
+
+		/**/
+		Cardinality		rows = brinsort_plan->scan.plan.plan_rows;
+
+		/* estimate rowsize in the tuplesort */
+		int				width = brinsort_plan->scan.plan.plan_width;
+		int				tupwidth = (MAXALIGN(width) + MAXALIGN(SizeofHeapTupleHeader));
+
+		/* Don't overflow work_mem (use only half to absorb variations. */
+		int				maxrows = (work_mem * 1024L / tupwidth / 2);
+
+		/* If this is a LIMIT query, aim only for the required number of rows. */
+		if (root->limit_tuples > 0)
+			maxrows = Min(maxrows, root->limit_tuples);
+
+		/* Use the attnum calculated above. */
+		amstats = (BrinMinmaxStats *) get_attindexam(brinsort_plan->indexid,
+													 brinsort_plan->attnums[0]);
+
+		if (amstats)
+		{
+			double	pct_per_step = Max(amstats->minval_increment_avg,
+									   amstats->maxval_increment_avg);
+			double	rows_per_step = Max(1.0, pct_per_step * rows);
+
+			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
+		}
+
+		/* some rough safety estimates */
+		brinsort_plan->watermark_step = Max(brinsort_plan->watermark_step, 1);
+		brinsort_plan->watermark_step = Min(brinsort_plan->watermark_step, 8192);
+	}
+
 	return brinsort_plan;
 }
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c36b0175344..1bc39b37606 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3541,7 +3541,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_NOT_IN_SAMPLE
 		},
 		&brinsort_watermark_step,
-		1, 1, INT_MAX,
+		0, 0, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 2a98286e11a..06dc6416d99 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1664,6 +1664,10 @@ typedef struct BrinSortStats
 	/* time to build ranges (milliseconds) */
 	int64	ranges_build_ms;
 
+	/* number/sum of watermark update steps */
+	int64	watermark_updates_steps;
+	int64	watermark_updates_count;
+
 } BrinSortStats;
 
 typedef struct BrinSortState
@@ -1696,6 +1700,7 @@ typedef struct BrinSortState
 	BrinRangeScanDesc *bs_scan;
 	BrinRange	   *bs_range;
 	ExprState	   *bs_qual;
+	int				bs_watermark_step;
 	Datum			bs_watermark;
 	bool			bs_watermark_set;
 	bool			bs_watermark_empty;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 341dfc57826..659a6d110ee 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -531,6 +531,9 @@ typedef struct BrinSort
 	/* NULLS FIRST/LAST directions */
 	bool	   *nullsFirst pg_node_attr(array_size(numCols));
 
+	/* number of watermark steps to make */
+	int			watermark_step;
+
 } BrinSort;
 
 /* ----------------
-- 
2.39.1

0008-wip-adaptive-watermark-step-20230218.patchtext/x-patch; charset=UTF-8; name=0008-wip-adaptive-watermark-step-20230218.patchDownload

From 716a727f7c56fab003504cbbdb0e260260f46432 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 01:39:39 +0200
Subject: [PATCH 08/10] wip: adaptive watermark step

Another option it to adjust the watermark step based on past tuplesort
executions, and either increase or decrease the step, based on whether
the sort was in-memory or on-disk, etc.

To do this, set the GUC to -1:

  SET brinsort_watermark_step = -1;
---
 src/backend/access/brin/brin_minmax.c   |   7 +-
 src/backend/executor/nodeBrinSort.c     | 189 +++++++++++++++++++++++-
 src/backend/optimizer/plan/createplan.c |  21 +--
 src/backend/utils/misc/guc_tables.c     |   2 +-
 src/include/nodes/execnodes.h           |   2 +-
 src/include/nodes/plannodes.h           |   3 +-
 6 files changed, 206 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 1181c9e7388..cfa56986e2b 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -47,9 +47,6 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
 
-/* print info about ranges */
-#define BRINSORT_DEBUG
-
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1989,7 +1986,7 @@ brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan, TupleTableSlot *slot,
 	scan->nranges++;
 }
 
-#ifdef BRINSORT_DEBUG
+#ifdef BRIN_SORT_DEBUG
 /*
  * brin_minmax_scan_next
  *		Return the next BRIN range information from the tuplestore.
@@ -2198,7 +2195,7 @@ brin_minmax_ranges(PG_FUNCTION_ARGS)
 	/* do the sort and any necessary post-processing */
 	brin_minmax_scan_finalize(brscan);
 
-#ifdef BRINSORT_DEBUG
+#ifdef BRIN_SORT_DEBUG
 	brin_minmax_scan_dump(brscan);
 #endif
 
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index f8356202b77..08507f2b5d9 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -218,6 +218,8 @@
  *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
  *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
  */
+#include <math.h>
+
 #include "postgres.h"
 
 #include "access/brin.h"
@@ -248,6 +250,14 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int	brinsort_watermark_step = 0;
+
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -859,6 +869,175 @@ brinsort_rescan(BrinSortState *node)
 	tuplesort_rescan(node->bs_scan->ranges);
 }
 
+/*
+ * Look at the tuplesort statistics, and maybe increase or decrease the
+ * watermark step. If the last sort was in-memory, we decrease the step.
+ * If the sort was in-memory, but we used less than work_mem/3, increment
+ * the step value.
+ *
+ * XXX This should probably behave differently for LIMIT queries, so that
+ * we don't load too many rows unnecessarily. We already consider that in
+ * create_brinsort_plan, but maybe we should limit increments to the step
+ * value here too - say, by tracking how many rows are we supposed to
+ * produce, and limiting the watermark so that we don't process too many
+ * rows in future steps.
+ *
+ * XXX We might also track the number of rows in the sort and space used,
+ * to calculate more accurate estimate of row width. And then use that to
+ * calculate number of rows that fit into work_mem. But the number of rows
+ * that go into tuplesort (per range) added would still remain fairly
+ * inaccurate, so not sure how good this woud be.
+ */
+static void
+brinsort_adjust_watermark_step(BrinSortState *node, TuplesortInstrumentation *stats)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+
+	if (brinsort_watermark_step != -1)
+		return;
+
+	if (stats->spaceType == SORT_SPACE_TYPE_DISK)
+	{
+		/*
+		 * We don't know how much to decrease the step (hard to estimate
+		 * due to space needed for in-memory and on-disk sorts is not
+		 * easily comparable, so we just cut the step in half. For the
+		 * in-memory sort, we then can do better estimate and increase
+		 * the step more accurately.
+		 */
+		plan->watermark_step = Max(1, plan->watermark_step / 2);
+	}
+	else
+	{
+		/*
+		 * Adjust the step based on the last sort - we shoot for 2/3 of
+		 * work_mem, to keep some slack (and not switch to on-disk sort
+		 * due to minor differences). We calculate the average row width
+		 * using space used and number of rows in the tuplesort, number
+		 * of rows we could fit into work_mem, and how many steps would
+		 * that mean (assuming number of rows is proportional to the
+		 * number of steps).
+		 *
+		 * We need to be careful about the number of rows we're supposed
+		 * to produce (and how many we already produced). Consider for
+		 * example a query with LIMIT 1000, and that we produce 999 rows
+		 * in the first sort, so that we need only 1 more row. It would
+		 * be silly to pick the steps with the goal to "fill work_mem"
+		 * instead of just enough to produce the one row.
+		 *
+		 * XXX In principle, we don't know how many rows will need to be
+		 * read from the table - there may be interesting rows already in
+		 * the tuplestore (in which case we could do a smaller step). But
+		 * we don't know how many such rows are there - maybe if we had
+		 * multiple smaller tuplestores, which would also reduce the
+		 * amount of "respill" we need to do.
+		 */
+		int		nrows_remaining;
+		int		step = plan->watermark_step;
+		int		step_max = plan->watermark_step * 2;
+
+		/* number of remaining rows we're expected to produce */
+		nrows_remaining = Max(1.0, plan->step_maxrows - node->bs_stats.ntuples_tuplesort_all);
+
+		/*
+		 * If we sorted any rows, calculate how many similar rows we can fit
+		 * into work_mem. We restrict ourselves to 2/3 of work_mem, to leave
+		 * a bit of slack space.
+		 *
+		 * XXX Hopefully the average width is somewhat accurate, but maybe
+		 * we should remember the width we originally expected, and combine
+		 * that somehow. Maybe we should not use just the last tuplesort,
+		 * but instead accumulate average from all preceding sorts and
+		 * combine them somehow (say, using weighted average with older
+		 * values having less influence).
+		 */
+		if (node->bs_stats.ntuples_tuplesort > 0)
+		{
+			int		nrows_wmem;
+			int		avgwidth;
+
+			/* average tuple width, calculated from last sort */
+			avgwidth = (stats->spaceUsed * 1024L / node->bs_stats.ntuples_tuplesort);
+
+			/*
+			 * Calculate the numer of rows to fit into 2/3 of work_mem, but
+			 * cap to the number of rows we're expected to produce.
+			 */
+			nrows_wmem = Min(nrows_remaining, (2 * 1024L * work_mem / 3) / avgwidth);
+
+			/* scale the number of steps to produce the number of rows */
+			step = step * ((double) (nrows_wmem * avgwidth) / (stats->spaceUsed * 1024L));
+
+			/* remember this as the max, so that we don't overflow work_mem */
+			step_max = Min(step, step_max);
+
+			/* however, make sure we don't grow too fast - cap to 2x */
+			step = Min(step, step_max);
+		}
+
+		/*
+		 * Now calculate average step size using data from all sorts we did
+		 * up to now. Then we calculate the number of steps we expect to be
+		 * necessary.
+		 *
+		 * If we had calculated average number of rows per step from AM stats,
+		 * consider that too. It's possible the batch had just one row, which
+		 * might result in very high estimate of steps - it'd be silly to
+		 * jump e.g. from 1 to 1000 based on this unreliable statistics. To
+		 * prevent that, we combine the two rows_per_step sources as weighted
+		 * sum, using the observed vs. target number of rows as weight. The
+		 * closer we're to the target, the more reliable value from past
+		 * executions is.
+		 *
+		 * But we don't want to overflow work_mem, so cap by step_max.
+		 */
+		if (node->bs_stats.ntuples_tuplesort_all > 0)
+		{
+			double		rows_per_step;
+
+			/* average number of rows we produced per step so far */
+			rows_per_step = (double) node->bs_stats.ntuples_tuplesort_all / node->bs_stats.watermark_updates_steps;
+
+			/*
+			 * If we have AM stats with average number of rows per step, consider
+			 * that too - approximate depending on what fraction of rows we already
+			 * produced (with higher fraction of rows produced we prefer the local
+			 * average, as opposed to the global average from index AM stats).
+			 */
+			if (plan->rows_per_step > 0)
+			{
+				/* number of rows we already produced (as a fraction) */
+				double weight = (double) node->bs_stats.ntuples_tuplesort_all / plan->step_maxrows;
+
+				/* paranoia */
+				weight = Min(1.0, weight);
+
+				/*
+				 * Approximate between index AM and "local" average calculated
+				 * from past executions. The closer we get to target rows, the
+				 * more we ignore the index AM stats.
+				 */
+				rows_per_step = weight * rows_per_step + (1 - weight) * plan->rows_per_step;
+			}
+
+			/* approximate the steps between */
+			step = Max(step, ceil((double) nrows_remaining / rows_per_step));
+
+			/*
+			 * But don't overflow the current max (which is set either
+			 * as 2x starting value, or from work_mem.
+			 */
+			step = Min(step, step_max);
+		}
+
+		plan->watermark_step = step;
+
+	}
+
+	plan->watermark_step = Max(1, plan->watermark_step);
+	plan->watermark_step = Min(8192, plan->watermark_step);
+}
+
 /* ----------------------------------------------------------------
  *		IndexNext
  *
@@ -997,13 +1176,21 @@ IndexNext(BrinSortState *node)
 
 						/*
 						 * Reset tuplesort statistics between runs, otherwise
-						 * we'll keep re-using stats from the largest run.
+						 * we'll keep re-using stats from the largest run, which
+						 * would then confuse the adaptive adjustment of the
+						 * watermark step.
 						 */
 						tuplesort_reset_stats(node->bs_tuplesortstate);
 
 						tuplesort_performsort(node->bs_tuplesortstate);
 
 						node->bs_stats.sort_count++;
+
+						memset(&stats, 0, sizeof(TuplesortInstrumentation));
+						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+						brinsort_adjust_watermark_step(node, &stats);
+
 						node->bs_stats.ntuples_tuplesort = 0;
 
 						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 01cfe3f5b58..89fb989fa5d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -324,13 +324,8 @@ static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
 											 GatherMergePath *best_path);
 
 
-/*
- * How many distinct minval values to look forward for the next watermark?
- *
- * The smallest step we can do is 1, which means the immediately following
- * (while distinct) minval.
- */
-int brinsort_watermark_step = 0;
+/* defined in nodeBrinSort.c */
+extern int brinsort_watermark_step;
 
 /*
  * create_plan
@@ -3449,8 +3444,14 @@ create_brinsort_plan(PlannerInfo *root,
 	 * are there, and if there are only few then try increasing the step?
 	 */
 	brinsort_plan->watermark_step = brinsort_watermark_step;
+	brinsort_plan->rows_per_step = -1;
 
-	if (brinsort_plan->watermark_step == 0)
+	if (root->limit_tuples > 0)
+		brinsort_plan->step_maxrows = root->limit_tuples;
+	else
+		brinsort_plan->step_maxrows = brinsort_plan->scan.plan.plan_rows;
+
+	if (brinsort_plan->watermark_step <= 0)
 	{
 		BrinMinmaxStats *amstats;
 
@@ -3478,7 +3479,9 @@ create_brinsort_plan(PlannerInfo *root,
 									   amstats->maxval_increment_avg);
 			double	rows_per_step = Max(1.0, pct_per_step * rows);
 
-			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
+			brinsort_plan->rows_per_step = rows_per_step;
+
+			brinsort_plan->watermark_step = (int) ceil(maxrows / rows_per_step);
 		}
 
 		/* some rough safety estimates */
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1bc39b37606..120250867d4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3541,7 +3541,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_NOT_IN_SAMPLE
 		},
 		&brinsort_watermark_step,
-		0, 0, INT_MAX,
+		0, -1, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 06dc6416d99..a3059314054 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1713,7 +1713,7 @@ typedef struct BrinSortState
 	 * We need two tuplesort instances - one for current range, one for
 	 * spill-over tuples from the overlapping ranges
 	 */
-	void		   *bs_tuplesortstate;
+	Tuplesortstate  *bs_tuplesortstate;
 	Tuplestorestate *bs_tuplestore;
 } BrinSortState;
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 659a6d110ee..b0cff1d02d2 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -533,7 +533,8 @@ typedef struct BrinSort
 
 	/* number of watermark steps to make */
 	int			watermark_step;
-
+	int			step_maxrows;
+	int			rows_per_step;
 } BrinSort;
 
 /* ----------------
-- 
2.39.1

0009-wip-add-brin_sort.sql-test-20230218.patchtext/x-patch; charset=UTF-8; name=0009-wip-add-brin_sort.sql-test-20230218.patchDownload

From a8a0e9a74985b9c545541c2060a53615de861205 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 3 Feb 2023 13:19:24 +0100
Subject: [PATCH 09/10] wip: add brin_sort.sql test

---
 src/test/regress/expected/brin_sort.out       | 543 ++++++++++++
 src/test/regress/expected/brin_sort_exprs.out | 788 ++++++++++++++++++
 src/test/regress/expected/brin_sort_multi.out | 545 ++++++++++++
 .../expected/brin_sort_multi_exprs.out        | 784 +++++++++++++++++
 src/test/regress/parallel_schedule            |   6 +
 src/test/regress/sql/brin_sort.sql            | 238 ++++++
 src/test/regress/sql/brin_sort_exprs.sql      | 373 +++++++++
 src/test/regress/sql/brin_sort_multi.sql      | 235 ++++++
 .../regress/sql/brin_sort_multi_exprs.sql     | 369 ++++++++
 9 files changed, 3881 insertions(+)
 create mode 100644 src/test/regress/expected/brin_sort.out
 create mode 100644 src/test/regress/expected/brin_sort_exprs.out
 create mode 100644 src/test/regress/expected/brin_sort_multi.out
 create mode 100644 src/test/regress/expected/brin_sort_multi_exprs.out
 create mode 100644 src/test/regress/sql/brin_sort.sql
 create mode 100644 src/test/regress/sql/brin_sort_exprs.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi_exprs.sql

diff --git a/src/test/regress/expected/brin_sort.out b/src/test/regress/expected/brin_sort.out
new file mode 100644
index 00000000000..0e207cf4f6f
--- /dev/null
+++ b/src/test/regress/expected/brin_sort.out
@@ -0,0 +1,543 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_exprs.out b/src/test/regress/expected/brin_sort_exprs.out
new file mode 100644
index 00000000000..87e58a10883
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_exprs.out
@@ -0,0 +1,788 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi.out b/src/test/regress/expected/brin_sort_multi.out
new file mode 100644
index 00000000000..22fa8331d38
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi.out
@@ -0,0 +1,545 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi_exprs.out b/src/test/regress/expected/brin_sort_multi_exprs.out
new file mode 100644
index 00000000000..0e9f4ea3182
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi_exprs.out
@@ -0,0 +1,784 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 15e015b3d64..39af15bb5d8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -131,3 +131,9 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# try sorting using BRIN index
+test: brin_sort
+test: brin_sort_multi
+test: brin_sort_exprs
+test: brin_sort_multi_exprs
diff --git a/src/test/regress/sql/brin_sort.sql b/src/test/regress/sql/brin_sort.sql
new file mode 100644
index 00000000000..f4458bdc386
--- /dev/null
+++ b/src/test/regress/sql/brin_sort.sql
@@ -0,0 +1,238 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_exprs.sql b/src/test/regress/sql/brin_sort_exprs.sql
new file mode 100644
index 00000000000..de1f1e7d8fe
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_exprs.sql
@@ -0,0 +1,373 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi.sql b/src/test/regress/sql/brin_sort_multi.sql
new file mode 100644
index 00000000000..d0544ad7069
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi.sql
@@ -0,0 +1,235 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi_exprs.sql b/src/test/regress/sql/brin_sort_multi_exprs.sql
new file mode 100644
index 00000000000..299c7979326
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi_exprs.sql
@@ -0,0 +1,369 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
-- 
2.39.1

0010-wip-test-generator-script-20230218.patchtext/x-patch; charset=UTF-8; name=0010-wip-test-generator-script-20230218.patchDownload

From c2c2eeb8eb93e6bcb8fb1f2e7bb2c6234741baf0 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 6 Feb 2023 03:42:52 +0100
Subject: [PATCH 10/10] wip: test generator script

---
 brin-test.py | 386 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 386 insertions(+)
 create mode 100644 brin-test.py

diff --git a/brin-test.py b/brin-test.py
new file mode 100644
index 00000000000..c90ec798bea
--- /dev/null
+++ b/brin-test.py
@@ -0,0 +1,386 @@
+import psycopg2
+import psycopg2.extras
+import random
+import sys
+import time
+import re
+
+from datetime import datetime
+from statistics import mean
+
+cols = [('int_val', 'int4_minmax_ops'),
+		('bigint_val', 'int8_minmax_ops'),
+		('text_val', 'text_minmax_ops'),
+		('inet_val', 'inet_minmax_ops'),
+		('(int_val+1)', 'int4_minmax_ops'),
+		('(bigint_val+1)', 'int8_minmax_ops'),
+		("('x' || text_val)", 'text_minmax_ops'),
+		('(inet_val + 1)', 'inet_minmax_ops'),
+		('(int_val+2)', 'int4_minmax_ops'),
+		('(bigint_val+2)', 'int8_minmax_ops'),
+		("('y' || text_val)", 'text_minmax_ops'),
+		('(inet_val + 2)', 'inet_minmax_ops')]
+
+# randomly reorder the table columns
+#table_cols = [('int_val int', 'i', 'i + %(skew)d * random()', 'i + 1000000 * random()'),
+#			  ('bigint_val bigint', '-i', '-i - 100 * random()', '-1 - 1000000 * random()'),
+#			  ('inet_val inet', "'10.0.0.0'::inet + i", "'10.0.0.0'::inet + i * 100 * random()::int", "'10.0.0.0'::inet + i + 1000000 * random()::int"),
+#			  ('text_val text', "lpad(i::text || md5(i::text), 40, '0')", "lpad((i + 100*random()::int)::text || md5(i::text), 40, '0')", "lpad((i + 1000000*random()::int)::text || md5(i::text), 40, '0')")]
+
+table_cols = [('int_val int', 'i + %(randomness)d * random()'),
+			  ('bigint_val bigint', '-i - %(randomness)d * random()'),
+			  ('inet_val inet', "'10.0.0.0'::inet + i + %(randomness)d * random()::int"),
+			  ('text_val text', "lpad((i + %(randomness)d * random()::int)::text || md5(i::text), 40, '0')")]
+
+
+def execute_query(cur, query, fetch_result = False):
+
+	cur.execute(query)
+
+	if fetch_result:
+		return cur.fetchall()
+
+
+# recreate the table with the columns in randomized order
+def recreate_table(conn, nrows, randomness, fillfactor):
+
+	random.shuffle(table_cols)
+
+	cur = conn.cursor()
+
+	execute_query(cur, 'BEGIN')
+
+	execute_query(cur, 'DROP TABLE IF EXISTS test_table')
+
+	execute_query(cur, 'CREATE TABLE test_table (%s) with (fillfactor=%d)' % (', '.join([v[0] for v in table_cols]), fillfactor))
+	print('CREATE TABLE test_table (%s) with (fillfactor=%d)' % (', '.join([v[0] for v in table_cols]), fillfactor))
+
+	insert_sql = 'INSERT INTO test_table SELECT %s FROM generate_series(1,%d) s(i)' % (', '.join([v[1] for v in table_cols]), nrows)
+	insert_sql = insert_sql % {'randomness' : int(nrows * randomness), 'rows' : nrows}
+
+	print(insert_sql)
+
+	execute_query(cur, insert_sql)
+
+	execute_query(cur, 'COMMIT')
+
+	cur.close()
+
+
+def create_indexes(conn, pages_per_range):
+
+	cur = conn.cursor()
+
+	num_indexes = random.randint(1,len(cols))
+
+	# randomly pick columns to index
+	indexed = random.sample(cols, num_indexes)
+
+	for c in indexed:
+		# f = random.random()
+		# num_pages = 1 + int(f * f * f * 256)
+		index_sql = 'CREATE INDEX ON test_table USING brin (%s %s) WITH (pages_per_range=%d)' % (c[0], c[1], pages_per_range)
+		print(index_sql)
+		execute_query(cur, index_sql)
+
+	cur.close()
+
+	return indexed
+
+
+def brinsort_in_explain(cur, query):
+
+	cur.execute('explain ' + query)
+	for r in cur.fetchall():
+		if 'BRIN Sort' in r['QUERY PLAN']:
+			return True
+
+	return False
+
+
+def compare_default(a, b):
+	if a < b:
+		return -1
+	elif a > b:
+		return 1
+	return 0
+
+
+def compare_inet(a, b):
+	a = [int(v) for v in a.split('.')]
+	b = [int(v) for v in b.split('.')]
+
+	for p in range(0,4):
+		r = compare_default(a[p], b[p])
+		if r != 0:
+			return r
+
+	return r
+
+
+def check_ordering(conn, config, query, expected_rows, select_star, select_list, sort_list, is_desc):
+
+	cur = conn.cursor()
+
+	data = execute_query(cur, query, True)
+
+	if len(data) != expected_rows:
+		print('ERROR: unexpected number of rows %s %s' % (expected_rows, len(data)))
+		sys.exit(1)
+
+	# what prefix we can check ordering for (some sort columns may not be
+	# included in the result, and we need a continuous prefix)
+	prefix = []
+	indexes = []
+	sort_order = {}
+	for s in sort_list:
+		if select_star:
+			if s[0] not in [x for x in table_cols]:
+				break
+
+			idx = [x for x in table_cols].index(s[0])
+		else:
+			if s not in select_list:
+				break
+
+			idx = select_list.index(s)
+
+		if idx is None:
+			break
+
+		sort_idx = sort_list.index(s)
+
+		prefix.append(s)
+		indexes.append(idx)
+		# sort_order.update({select_list.index(s) : is_desc[sort_list.index(s)]})
+		sort_order.update({idx : is_desc[sort_idx]})
+
+	# print("PREFIX", indexes, prefix)
+
+	if len(prefix) != 0:
+		prev = None
+		for row in data:
+			if prev is not None:
+
+				for idx in indexes:
+
+					if select_list[idx][1] == 'inet_minmax_ops':
+						r = compare_inet(prev[idx], row[idx])
+					else:
+						r = compare_default(prev[idx], row[idx])
+
+					if sort_order[idx]:
+						r = -r
+
+					if r > 0:
+						print("ERROR: incorrect ordering %s > %s" % (prev[idx], row[idx]))
+						sys.exit(1)
+
+					if r < 0:
+						break
+
+			prev = row
+
+	cur.close()
+
+
+def run_queries(conn, config, indexed_cols, num_queries = 1000):
+
+	nquery = 0
+
+	while nquery < num_queries:
+		if run_query(conn, config, indexed_cols):
+			nquery += 1
+
+
+def query_timing(cur, query):
+
+	runs = []
+
+	# get explain plan and costs from the first node
+	r = execute_query(cur, 'explain (analyze, timing off) %s' % (query,), True)
+	print("")
+	print("\n".join(['    ' + x[0] for x in r]))
+	print("")
+
+	sys.stdout.flush()
+
+	r = re.search('cost=([^\s]*)\.\.([^\s]*)', r[0][0])
+	costs = [float(r.groups()[0]), float(r.groups()[1])]
+
+	for r in range(0,1):
+		s = time.time()
+		execute_query(cur, query)
+		d = time.time()
+		runs.append(d-s)
+
+	# print("runs %s => mean %s" % (str(runs), mean(runs)))
+
+	sys.stdout.flush()
+
+	return (mean(runs), costs)
+
+
+def check_timing(conn, config, query):
+
+	cur = conn.cursor()
+
+	# get timing for a simple plan without a BRIN sort
+
+	execute_query(cur, 'set enable_seqscan = on')
+	execute_query(cur, 'set enable_brinsort = off')
+
+	(seqscan_time, seqscan_costs) = query_timing(cur, query)
+
+	# get timing for a simple plan with a BRIN sort
+	execute_query(cur, 'set enable_seqscan = off')
+	execute_query(cur, 'set enable_brinsort = on')
+	execute_query(cur, query)
+
+	(brinsort_time, brinsort_costs) = query_timing(cur, query)
+
+	print ("timing", 'rows', config['nrows'], 'pages_per_range', config['pages_per_range'], 'randomness', config['randomness'], 'fillfactor', config['fillfactor'], 'work_mem', config['work_mem'], 'watermark_step', config['watermark_step'], 'limit', config['limit'], 'offset', config['offset'], "seqscan", seqscan_time, "brinsort", brinsort_time, "costs seqscan", seqscan_costs[0], seqscan_costs[1], "brinsort", brinsort_costs[0], brinsort_costs[1])
+	# print ("brinsort timing", brinsort_time, "costs", brinsort_costs[0], brinsort_costs[1])
+
+	if (seqscan_costs[1] * 1.1 < brinsort_costs[1]) and (seqscan_time > brinsort_time * 1.1):
+		print ("COSTING ISSUE (%f < %f) && (%f > %f)" % (seqscan_costs[1], brinsort_costs[1], seqscan_time, brinsort_time))
+
+	if (seqscan_costs[1] > brinsort_costs[1] * 1.1) and (seqscan_time * 1.1 < brinsort_time):
+		print ("COSTING ISSUE (%f > %f) && (%f < %f)" % (seqscan_costs[1], brinsort_costs[1], seqscan_time, brinsort_time))
+
+	sys.stdout.flush()
+
+
+def run_query(conn, config, indexed_cols):
+
+	limit_rows = config['nrows']
+	offset_rows = 0
+
+	cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
+
+	# random columns to reference in the SELECT list, may not include sort column(s)
+	select_list = random.sample(cols, random.randint(1,len(cols)))
+
+	# but maybe just do select *, so that we don't do a projection
+	select_star = False
+	if random.random() < 0.5:
+		select_star = True
+		select_list = [('*', None)]
+
+	# random columns to reference in the ORDER BY clause
+	sort_list = random.sample(cols, random.randint(1,len(cols)))
+
+	# generate random ASC / DESC modifiers
+	is_desc = []
+	order_by = []
+	for s in range(0,len(sort_list)):
+		desc = random.choice([True, False])
+		is_desc.append(desc)
+		x = sort_list[s][0]
+		if desc:
+			x = x + ' DESC'
+		order_by.append(x)
+
+	query = 'SELECT %s FROM test_table ORDER BY %s' % (', '.join([v[0] for v in select_list]), ', '.join(order_by))
+
+	# randomly add LIMIT and OFFSET clause(s)
+	if random.random() < 0.5:
+
+		limit_rows = 1 + int(pow(random.random(), 3) * random.randint(1,config['nrows']))
+		query = query + ' LIMIT %d' % (limit_rows,);
+
+		if limit_rows < config['nrows'] and random.random() < 0.5:
+
+			offset_rows = int(pow(random.random(), 3) * random.randint(1,config['nrows'] - limit_rows))
+			query = query + ' OFFSET %d' % (offset_rows,);
+
+	expected_rows = min(limit_rows, config['nrows'] - offset_rows)
+
+	# watermark_step = random.randint(-1, 3)
+	watermark_step = random.choice([-1, 0, 1, 8, 32, 128])
+	execute_query(cur, 'SET brinsort_watermark_step = %d' % (watermark_step,))
+
+	f = random.random()
+	#work_mem_kb = 64 + int((f * f * f) * random.randint(64, 32768))
+	work_mem_kb = random.choice([64, 1024, 4096, 32768])
+
+	execute_query(cur, "SET work_mem = '%dkB'" % (work_mem_kb,))
+
+	config = config.copy()
+	config.update({'work_mem' : work_mem_kb})
+	config.update({'watermark_step' : watermark_step})
+	config.update({'limit' : limit_rows})
+	config.update({'offset' : offset_rows})
+
+	# do we expect brinsort or not? only when the first ORDER BY is indexed
+	if sort_list[0] in indexed_cols:
+
+		print('--------------', datetime.now(), '--------------')
+		print("SQL:", query)
+		print("CONFIG:", config)
+
+		if brinsort_in_explain(cur, query):
+			check_ordering(conn, config, query, expected_rows, select_star, select_list, sort_list, is_desc)
+			check_timing(conn, config, query)
+		else:
+			print("ERROR: BRIN Sort not in plan")
+			sys.exit(1)
+
+		result = True
+
+	else:
+
+		if brinsort_in_explain(cur, query):
+			print("ERROR: BRIN Sort in plan")
+			sys.exit(1)
+
+		result = False
+
+	cur.close()
+
+	return result
+
+
+def setup_connection(conn):
+	cur = conn.cursor()
+
+	# force index access
+	execute_query(cur, 'SET enable_seqscan = off')
+	execute_query(cur, 'SET max_parallel_workers_per_gather = 0')
+
+	cur.close()
+
+
+run_id = 0
+
+while True:
+
+	run_id += 1
+
+	config = {}
+
+	conn = psycopg2.connect('host=localhost port=5432 dbname=test user=user')
+
+	setup_connection(conn)
+
+	print('========== run %d ==========' % (run_id,))
+
+	# data distribution (1 - sequential, 3 - random)
+	config['randomness'] = random.choice([0, 0.05, 0.1, 0.25, 0.5, 1.0])
+
+	# random fillfactor, skewed closer to 10%
+	config['fillfactor'] = 10 + int(pow(random.random(),3) * 90)
+
+	# random number of rows
+	config['nrows'] = random.choice([100000, 1000000])
+
+	# pages per BRIN range (for all indexes)
+	config['pages_per_range'] = random.choice([1, 32, 128])
+
+	recreate_table(conn, config['nrows'], config['randomness'], config['fillfactor'])
+
+	indexed_cols = create_indexes(conn, config['pages_per_range'])
+
+	run_queries(conn, config, indexed_cols)
+
+	conn.close()
-- 
2.39.1

#28

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Tomas Vondra (#27)

11 attachment(s)

Re: PATCH: Using BRIN indexes for sorted output

cfbot complained there's one more place triggering a compiler warning on
32-bit systems, so here's a version fixing that.

I've also added a copy of the regression tests but using the indexam
stats added in 0001. This is just a copy of the already existing
regression tests, just with enable_indexam_stats=true - this should
catch some of the issues that went mostly undetected in the earlier
patch versions.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Allow-index-AMs-to-build-and-use-custom-s-20230218-2.patchtext/x-patch; charset=UTF-8; name=0001-Allow-index-AMs-to-build-and-use-custom-s-20230218-2.patchDownload

From 7b3307c27b35ece119feab4891f03749250e454b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 17 Oct 2022 18:39:28 +0200
Subject: [PATCH 01/11] Allow index AMs to build and use custom statistics

Some indexing AMs work very differently and estimating them using
existing statistics is problematic, producing unreliable costing. This
applies e.g. to BRIN, which relies on page ranges, not tuple pointers.

This adds an optional AM procedure, allowing the opfamily to build
custom statistics, store them in pg_statistic and then use them during
planning. By default this is disabled, but may be enabled by setting

   SET enable_indexam_stats = true;

Then ANALYZE will call the optional procedure for all indexes.
---
 src/backend/access/brin/brin.c                |   1 +
 src/backend/access/brin/brin_minmax.c         | 896 ++++++++++++++++++
 src/backend/commands/analyze.c                | 149 ++-
 src/backend/statistics/extended_stats.c       |   2 +
 src/backend/utils/adt/selfuncs.c              |  59 ++
 src/backend/utils/cache/lsyscache.c           |  41 +
 src/backend/utils/misc/guc_tables.c           |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h                    |   2 +
 src/include/access/brin.h                     |  63 ++
 src/include/access/brin_internal.h            |   1 +
 src/include/catalog/pg_amproc.dat             |  64 ++
 src/include/catalog/pg_proc.dat               |   4 +
 src/include/catalog/pg_statistic.h            |   5 +
 src/include/commands/vacuum.h                 |   2 +
 src/include/utils/lsyscache.h                 |   1 +
 src/test/regress/expected/sysviews.out        |   3 +-
 17 files changed, 1298 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index de1427a1e0e..e8bf20a6bae 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -95,6 +95,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->amstrategies = 0;
 	amroutine->amsupport = BRIN_LAST_OPTIONAL_PROCNUM;
 	amroutine->amoptsprocnum = BRIN_PROCNUM_OPTIONS;
+	amroutine->amstatsprocnum = BRIN_PROCNUM_STATISTICS;
 	amroutine->amcanorder = false;
 	amroutine->amcanorderbyop = false;
 	amroutine->amcanbackward = false;
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 2431591be65..6b0f303504d 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -10,17 +10,23 @@
  */
 #include "postgres.h"
 
+#include "access/brin.h"
 #include "access/brin_internal.h"
+#include "access/brin_revmap.h"
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
+#include "executor/executor.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 typedef struct MinmaxOpaque
 {
@@ -253,6 +259,896 @@ brin_minmax_union(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/* FIXME copy of a private struct from brin.c */
+typedef struct BrinOpaque
+{
+	BlockNumber bo_pagesPerRange;
+	BrinRevmap *bo_rmAccess;
+	BrinDesc   *bo_bdesc;
+} BrinOpaque;
+
+/*
+ * Compare ranges by minval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_minval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->min_value, rb->min_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/*
+ * Compare ranges by maxval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_maxval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->max_value, rb->max_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/* compare values using an operator from typcache */
+static int
+range_values_cmp(const void *a, const void *b, void *arg)
+{
+	Datum	da = * (Datum *) a;
+	Datum	db = * (Datum *) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  da, db);
+	return DatumGetInt32(c);
+}
+
+/*
+ * minval_end
+ *		Determine first index so that (minval > value).
+ *
+ * The array of ranges is expected to be sorted by minvalue, so this is the first
+ * range that can't possibly intersect with a range having "value" as maxval.
+ */
+static int
+minval_end(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[end]->min_value, typcache) >= 0)
+		return nranges;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[start]->min_value, typcache) < 0)
+		return 0;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->min_value, typcache);
+
+		if (r >= 0)
+			start = midpoint + 1;
+		else
+			end = midpoint;
+	}
+
+	Assert(ranges[start]->min_value > value);
+	Assert(ranges[start-1]->min_value <= value);
+
+	return start;
+}
+
+
+/*
+ * lower_bound
+ *		Determine first index so that (values[index] >= value).
+ *
+ * The array of values is sorted, and this returns the first value that
+ * exceeds (or is equal) to the minvalue.
+ */
+static int
+lower_bound(Datum *values, int nvalues, Datum minvalue, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	/* all values exceed minvalue - return the first element */
+	if (range_values_cmp(&minvalue, &values[start], typcache) <= 0)
+		return 0;
+
+	/* nothing matches - return the element after the last one */
+	if (range_values_cmp(&minvalue, &values[end], typcache) > 0)
+		return nvalues;
+
+	/*
+	 * Now we know the lower boundary is somewhere in the array (and we know
+	 * it's not the first element, because that's covered by the first check
+	 * above). So do a binary search.
+	 */
+	while ((end - start) > 0)
+	{
+		int	midpoint;
+		int	r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&minvalue, &values[midpoint], typcache);
+
+		if (r <= 0)	/* minvalue >= midpoint */
+			end = midpoint;
+		else		/* midpoint < minvalue */
+			start = (midpoint + 1);
+	}
+
+	Assert(range_values_cmp(&minvalue, &values[start], typcache) <= 0);
+	Assert(range_values_cmp(&minvalue, &values[start-1], typcache) > 0);
+
+	return start;
+}
+
+/*
+ * upper_bound
+ *		Determine last index so that (values[index] <= maxvalue).
+ *
+ * The array of values is sorted, and this returns the last value that
+ * does not exceed (or is equal) to the maxvalue.
+ */
+static int
+upper_bound(Datum *values, int nvalues, Datum maxvalue, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	/* everything matches, return the last element */
+	if (range_values_cmp(&values[end], &maxvalue, typcache) <= 0)
+		return (nvalues - 1);
+
+	/* nothing matches, return the element before the first one */
+	if (range_values_cmp(&values[start], &maxvalue, typcache) > 0)
+		return -1;
+
+	/*
+	 * Now we know the lower boundary is somewhere in the array (and we know
+	 * it's not the last element, because that's covered by the first check
+	 * above). So do a binary search.
+	 */
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		/* Ensure we always move (it might be equal to start due to rounding). */
+		midpoint = Max(start+1, midpoint);
+
+		r = range_values_cmp(&values[midpoint], &maxvalue, typcache);
+
+		if (r <= 0)			/* value <= maxvalue */
+			start = midpoint;
+		else				/* value > maxvalue */
+			end = midpoint - 1;
+	}
+
+	Assert(range_values_cmp(&values[start], &maxvalue, typcache) <= 0);
+	Assert(range_values_cmp(&values[start+1], &maxvalue, typcache) > 0);
+
+	return start;
+}
+
+/*
+ * brin_minmax_count_overlaps
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges to quickly eliminate ranges that can't possibly
+ * intersect. We simply walk minranges until minval > current maxval, and
+ * we're done.
+ *
+ * Unlike brin_minmax_count_overlaps2, this does not have issues with wide
+ * ranges, so this is what we should use.
+ */
+static void
+brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
+						   TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	int64	noverlaps;
+
+	noverlaps = 0;
+	for (int i = 0; i < nranges; i++)
+	{
+		Datum	maxval = minranges[i]->max_value;
+
+		/*
+		 * Determine index of the first range with (minval > current maxval)
+		 * by binary search. We know all other ranges can't overlap the
+		 * current one. We simply subtract indexes to count ranges.
+		 */
+		int		idx = minval_end(minranges, nranges, maxval, typcache);
+
+		/* -1 because we don't count the range as intersecting with itself */
+		noverlaps += (idx - i - 1);
+	}
+
+	/*
+	 * We only count 1/2 the ranges (minval > current minval), so the total
+	 * number of overlaps is twice what we counted.
+	 */
+	noverlaps *= 2;
+
+	stats->avg_overlaps = (double) noverlaps / nranges;
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Alternative to brin_minmax_match_tuples_to_ranges2, leveraging ordering
+ * of values, not ranges.
+ *
+ * XXX This seems like the optimal way to do this.
+ */
+static void
+brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
+								   int numrows, HeapTuple *rows,
+								   int nvalues, Datum *values,
+								   TypeCacheEntry *typcache,
+								   BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nvalues_unique = 0;
+
+	int64  *unique = (int64 *) palloc0(sizeof(int64) * nvalues);
+
+	/*
+	 * Build running count of unique values. We know there are unique[i]
+	 * unique values in values array up to index "i".
+	 */
+	unique[0] = 1;
+	for (int i = 1; i < nvalues; i++)
+	{
+		if (range_values_cmp(&values[i-1], &values[i], typcache) == 0)
+			unique[i] = unique[i-1];
+		else
+			unique[i] = unique[i-1] + 1;
+	}
+
+	nvalues_unique = unique[nvalues-1];
+
+	/*
+	 * Walk the ranges, for each range determine the first/last mapping
+	 * value. Use the "unique" array to count the unique values.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int		start,
+				end,
+				nvalues_match,
+				nunique_match;
+
+		CHECK_FOR_INTERRUPTS();
+
+		start = lower_bound(values, nvalues, ranges->ranges[i].min_value, typcache);
+		end = upper_bound(values, nvalues, ranges->ranges[i].max_value, typcache);
+
+		/* if nothing matches (e.g. end=0), skip this range */
+		if (end <= start)
+			continue;
+
+		nvalues_match = (end - start + 1);
+		nunique_match = (unique[end] - unique[start] + 1);
+
+		Assert((nvalues_match >= 1) && (nvalues_match <= nvalues));
+		Assert((nunique_match >= 1) && (nunique_match <= unique[nvalues-1]));
+
+		nmatches += nvalues_match;
+		nmatches_unique += nunique_match;
+	}
+
+	Assert(nmatches >= 0);
+	Assert(nmatches_unique >= 0);
+
+	stats->avg_matches = (double) nmatches / numrows;
+	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
+}
+
+/*
+ * brin_minmax_value_stats
+ *		Calculate statistics about minval/maxval values.
+ *
+ * We calculate the number of distinct values, and also correlation with respect
+ * to blkno_start. We don't calculate the regular correlation coefficient, because
+ * our goal is to estimate how sequential the accesses are. The regular correlation
+ * would produce 0 for cyclical data sets like mod(i,1000000), but it may be quite
+ * sequantial access. Maybe it should be called differently, not correlation?
+ *
+ * XXX Maybe this should calculate minval vs. maxval correlation too?
+ *
+ * XXX I don't know how important the sequentiality is - BRIN generally uses 1MB
+ * page ranges, which is pretty sequential and the one random seek in between is
+ * likely going to be negligible. Maybe for small page ranges it'll matter, though.
+ */
+static void
+brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
+						int nranges, TypeCacheEntry *typcache,
+						BrinMinmaxStats *stats)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1,
+			minval_corr = 0,
+			maxval_corr = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+			minval_ndist++;
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+			maxval_ndist++;
+
+		/* is it immediately sequential? */
+		if (minranges[i-1]->blkno_end + 1 == minranges[i]->blkno_start)
+			minval_corr++;
+
+		/* is it immediately sequential? */
+		if (maxranges[i-1]->blkno_end + 1 == maxranges[i]->blkno_start)
+			maxval_corr++;
+	}
+
+	stats->minval_ndistinct = minval_ndist;
+	stats->maxval_ndistinct = maxval_ndist;
+
+	stats->minval_correlation = (double) minval_corr / nranges;
+	stats->maxval_correlation = (double) maxval_corr / nranges;
+}
+
+/*
+ * brin_minmax_increment_stats
+ *		Calculate the increment size for minval/maxval steps.
+ *
+ * Calculates the minval/maxval increment size, i.e. number of rows that need
+ * to be added to the sort. This serves as an input to calculation of a good
+ * watermark step.
+ */
+static void
+brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
+							int nranges, Datum *values, int nvalues,
+							TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1;
+
+	double	sum_minval = 0,
+			sum_maxval = 0,
+			max_minval = 0,
+			max_maxval = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+		{
+			double	p;
+			int		start = upper_bound(values, nvalues, minranges[i-1]->min_value, typcache);
+			int		end = upper_bound(values, nvalues, minranges[i]->min_value, typcache);
+
+			/*
+			 * Maybe there are no matching rows, but we still need to count
+			 * this as distinct minval (even though the sample increase is 0).
+			 */
+			minval_ndist++;
+
+			Assert(end >= start);
+
+			/* no sample rows match this, so skip */
+			if (end == start)
+				continue;
+
+			p = (double) (end - start) / nvalues;
+
+			max_minval = Max(max_minval, p);
+			sum_minval += p;
+		}
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+		{
+			double	p;
+			int		start = upper_bound(values, nvalues, maxranges[i-1]->max_value, typcache);
+			int		end = upper_bound(values, nvalues, maxranges[i]->max_value, typcache);
+
+			/*
+			 * Maybe there are no matching rows, but we still need to count
+			 * this as distinct maxval (even though the sample increase is 0).
+			 */
+			maxval_ndist++;
+
+			Assert(end >= start);
+
+			/* no sample rows match this, so skip */
+			if (end == start)
+				continue;
+
+			p = (double) (end - start) / nvalues;
+
+			max_maxval = Max(max_maxval, p);
+			sum_maxval += p;
+		}
+	}
+
+	stats->minval_increment_avg = (sum_minval / minval_ndist);
+	stats->minval_increment_max = max_minval;
+
+	stats->maxval_increment_avg = (sum_maxval / maxval_ndist);
+	stats->maxval_increment_max = max_maxval;
+}
+
+/*
+ * brin_minmax_stats
+ *		Calculate custom statistics for a BRIN minmax index.
+ *
+ * At the moment this calculates:
+ *
+ *  - number of summarized/not-summarized and all/has nulls ranges
+ *  - average number of overlaps for a range
+ *  - average number of rows matching a range
+ *  - number of distinct minval/maxval values
+ *
+ * XXX This could also calculate correlation of the range minval, so that
+ * we can estimate how much random I/O will happen during the BrinSort.
+ * And perhaps we should also sort the ranges by (minval,block_start) to
+ * make this as sequential as possible?
+ *
+ * XXX Another interesting statistics might be the number of ranges with
+ * the same minval (or number of distinct minval values), because that's
+ * essentially what we need to estimate how many ranges will be read in
+ * one brinsort step. In fact, knowing the number of distinct minval
+ * values tells us the number of BrinSort loops.
+ *
+ * XXX We might also calculate a histogram of minval/maxval values.
+ *
+ * XXX I wonder if we could track for each range track probabilities:
+ *
+ * - P1 = P(v <= minval)
+ * - P2 = P(x <= Max(maxval)) for Max(maxval) over preceding ranges
+ *
+ * That would allow us to estimate how many ranges we'll have to read to produce
+ * a particular number of rows, because we need the first probability to exceed
+ * the requested number of rows (fraction of the table):
+ *
+ *     (limit rows / reltuples) <= P(v <= minval)
+ *
+ * and then the second probability would say how many rows we'll process (either
+ * sort or spill). And inversely for the DESC ordering.
+ *
+ * The difference between P1 for two ranges is how much we'd have to sort
+ * if we moved the watermark between the ranges (first minval to second one).
+ * The (P2 - P1) for the new watermark range measures the number of rows in
+ * the tuplestore. We'll need to aggregate this, though, we can't keep the
+ * whole data - probably average/median/max for the differences would be nice.
+ * Might be tricky for different watermark step values, though.
+ *
+ * This would also allow estimating how many rows will spill from each range,
+ * because we have an estimate how many rows match a range on average, and
+ * we can compare it to the difference between P1.
+ *
+ * One issue is we don't have actual tuples from the ranges, so we can't
+ * measure exactly how many rows would we add. But we can match the sample
+ * and at least estimate the the probability difference.
+ *
+ * Actually - we do know the tuples *are* in those ranges, because if we
+ * assume the tuple is in some other range, that range would have to have
+ * a minimal/maximal value so that the value is consistent. Which means
+ * the range has to be between those ranges. Of course, this only estimates
+ * the rows we'd going to add to the tuplesort - there might be more rows
+ * we read and spill to tuplestore, but that's something we can estimate
+ * using average tuples per range.
+ */
+Datum
+brin_minmax_stats(PG_FUNCTION_ARGS)
+{
+	Relation		heapRel = (Relation) PG_GETARG_POINTER(0);
+	Relation		indexRel = (Relation) PG_GETARG_POINTER(1);
+	AttrNumber		attnum = PG_GETARG_INT16(2);	/* index attnum */
+	AttrNumber		heap_attnum = PG_GETARG_INT16(3);
+	Expr		   *expr = (Expr *) PG_GETARG_POINTER(4);
+	HeapTuple	   *rows = (HeapTuple *) PG_GETARG_POINTER(5);
+	int				numrows = PG_GETARG_INT32(6);
+
+	BrinOpaque *opaque;
+	BlockNumber nblocks;
+	BlockNumber	nranges;
+	BlockNumber	heapBlk;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BrinRanges  *ranges;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinMinmaxStats *stats;
+
+	Oid				typoid;
+	TypeCacheEntry *typcache;
+	BrinRange	  **minranges,
+				  **maxranges;
+	int64			prev_min_index;
+
+	/* expression stats */
+	EState	   *estate;
+	ExprContext *econtext;
+	ExprState  *exprstate;
+	TupleTableSlot *slot;
+
+	/* attnum or expression has to be supplied */
+	Assert(AttributeNumberIsValid(heap_attnum) || (expr != NULL));
+
+	/* but not both of them at the same time */
+	Assert(!(AttributeNumberIsValid(heap_attnum) && (expr != NULL)));
+
+	/*
+	 * Mostly what brinbeginscan does to initialize BrinOpaque, except that
+	 * we use active snapshot instead of the scan snapshot.
+	 */
+	opaque = palloc_object(BrinOpaque);
+	opaque->bo_rmAccess = brinRevmapInitialize(indexRel,
+											   &opaque->bo_pagesPerRange,
+											   GetActiveSnapshot());
+	opaque->bo_bdesc = brin_build_desc(indexRel);
+
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+
+	/*
+	 * How many ranges can there be? We simply look at the number of pages,
+	 * divide it by the pages_per_range.
+	 *
+	 * XXX We need to be careful not to overflow nranges, so we just divide
+	 * and then maybe add 1 for partial ranges.
+	 */
+	nranges = (nblocks / pagesPerRange);
+	if (nblocks % pagesPerRange != 0)
+		nranges += 1;
+
+	/* allocate for space, and also for the alternative ordering */
+	ranges = palloc0(offsetof(BrinRanges, ranges) + nranges * sizeof(BrinRange));
+	ranges->nranges = 0;
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* result stats */
+	stats = palloc0(sizeof(BrinMinmaxStats));
+	SET_VARSIZE(stats, sizeof(BrinMinmaxStats));
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX We count the ranges, and count the special types (not summarized,
+	 * all-null and has-null). The regular ranges are accumulated into an
+	 * array, so that we can calculate additional statistics (overlaps, hits
+	 * for sample tuples, etc).
+	 *
+	 * XXX This needs rethinking to make it work with large indexes with more
+	 * ranges than we can fit into memory (work_mem/maintenance_work_mem).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		stats->n_ranges++;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   GetActiveSnapshot());
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/* Ranges with no indexed tuple are ignored for overlap analysis. */
+		if (!gottuple)
+		{
+			continue;
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/* Placeholders can be ignored too, as if not summarized. */
+				continue;
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				/* OK this range is summarized */
+				stats->n_summarized++;
+
+				if (bval->bv_allnulls)
+					stats->n_all_nulls++;
+
+				if (bval->bv_hasnulls)
+					stats->n_has_nulls++;
+
+				if (!bval->bv_allnulls)
+				{
+					BrinRange  *range;
+
+					range = &ranges->ranges[ranges->nranges++];
+
+					range->blkno_start = heapBlk;
+					range->blkno_end = heapBlk + (pagesPerRange - 1);
+
+					range->min_value = bval->bv_values[0];
+					range->max_value = bval->bv_values[1];
+				}
+			}
+		}
+	}
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* if we have no regular ranges, we're done */
+	if (ranges->nranges == 0)
+		goto cleanup;
+
+	/*
+	 * Build auxiliary info to optimize the calculation.
+	 *
+	 * We have ranges in the blocknum order, but that is not very useful when
+	 * calculating which ranges interstect - we could cross-check every range
+	 * against every other range, but that's O(N^2) and thus may get extremely
+	 * expensive pretty quick).
+	 *
+	 * To make that cheaper, we'll build two orderings, allowing us to quickly
+	 * eliminate ranges that can't possibly overlap:
+	 *
+	 * - minranges = ranges ordered by min_value
+	 * - maxranges = ranges ordered by max_value
+	 *
+	 * To count intersections, we'll then walk maxranges (i.e. ranges ordered
+	 * by maxval), and for each following range we'll check if it overlaps.
+	 * If yes, we'll proceed to the next one, until we find a range that does
+	 * not overlap. But there might be a later page overlapping - but we can
+	 * use a min_index_lowest tracking the minimum min_index for "future"
+	 * ranges to quickly decide if there are such ranges. If there are none,
+	 * we can terminate (and proceed to the next maxranges element), else we
+	 * have to process additional ranges.
+	 *
+	 * Note: This only counts overlaps with ranges with max_value higher than
+	 * the current one - we want to count all, but the overlaps with preceding
+	 * ranges have already been counted when processing those preceding ranges.
+	 * That is, we'll end up with counting each overlap just for one of those
+	 * ranges, so we get only 1/2 the count.
+	 *
+	 * Note: We don't count the range as overlapping with itself. This needs
+	 * to be considered later, when applying the statistics.
+	 *
+	 *
+	 * XXX This will not work for very many ranges - we can have up to 2^32 of
+	 * them, so allocating a ~32B struct for each would need a lot of memory.
+	 * Not sure what to do about that, perhaps we could sample a couple ranges
+	 * and do some calculations based on that? That is, we could process all
+	 * ranges up to some number (say, statistics_target * 300, as for rows), and
+	 * then sample ranges for larger tables. Then sort the sampled ranges, and
+	 * walk through all ranges once, comparing them to the sample and counting
+	 * overlaps (having them sorted should allow making this quite efficient,
+	 * I think - following algorithm similar to the one implemented here).
+	 */
+
+	/* info about ordering for the data type */
+	typoid = get_atttype(RelationGetRelid(indexRel), attnum);
+	typcache = lookup_type_cache(typoid, TYPECACHE_CMP_PROC_FINFO);
+
+	/* shouldn't happen, I think - we use this to build the index */
+	Assert(OidIsValid(typcache->cmp_proc_finfo.fn_oid));
+
+	minranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+	maxranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+
+	/*
+	 * Build and sort the ranges min_value / max_value (just pointers
+	 * to the main array). Then go and assign the min_index to each
+	 * range, and finally walk the maxranges array backwards and track
+	 * the min_index_lowest as minimum of "future" indexes.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		minranges[i] = &ranges->ranges[i];
+		maxranges[i] = &ranges->ranges[i];
+	}
+
+	qsort_arg(minranges, ranges->nranges, sizeof(BrinRange *),
+			  range_minval_cmp, typcache);
+
+	qsort_arg(maxranges, ranges->nranges, sizeof(BrinRange *),
+			  range_maxval_cmp, typcache);
+
+	/*
+	 * Update the min_index for each range. If the values are equal, be sure to
+	 * pick the lowest index with that min_value.
+	 */
+	minranges[0]->min_index = 0;
+	for (int i = 1; i < ranges->nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i]->min_value, &minranges[i-1]->min_value, typcache) == 0)
+			minranges[i]->min_index = minranges[i-1]->min_index;
+		else
+			minranges[i]->min_index = i;
+	}
+
+	/*
+	 * Walk the maxranges backward and assign the min_index_lowest as
+	 * a running minimum.
+	 */
+	prev_min_index = ranges->nranges;
+	for (int i = (ranges->nranges - 1); i >= 0; i--)
+	{
+		maxranges[i]->min_index_lowest = Min(maxranges[i]->min_index,
+											 prev_min_index);
+		prev_min_index = maxranges[i]->min_index_lowest;
+	}
+
+	/* calculate average number of overlapping ranges for any range */
+	brin_minmax_count_overlaps(minranges, ranges->nranges, typcache, stats);
+
+	/* calculate minval/maxval stats (distinct values and correlation) */
+	brin_minmax_value_stats(minranges, maxranges,
+							ranges->nranges, typcache, stats);
+
+	/*
+	 * If processing expression, prepare context to evaluate it.
+	 *
+	 * XXX cleanup / refactoring needed
+	 */
+	if (expr)
+	{
+		estate = CreateExecutorState();
+		econtext = GetPerTupleExprContext(estate);
+
+		/* Need a slot to hold the current heap tuple, too */
+		slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRel),
+										&TTSOpsHeapTuple);
+
+		/* Arrange for econtext's scan tuple to be the tuple under test */
+		econtext->ecxt_scantuple = slot;
+
+		exprstate = ExecPrepareExpr(expr, estate);
+	}
+
+	/* match tuples to ranges */
+	{
+		int		nvalues = 0;
+		Datum  *values = (Datum *) palloc0(numrows * sizeof(Datum));
+
+		TupleDesc	tdesc = RelationGetDescr(heapRel);
+
+		for (int i = 0; i < numrows; i++)
+		{
+			bool	isnull;
+			Datum	value;
+
+			if (!expr)
+				value = heap_getattr(rows[i], heap_attnum, tdesc, &isnull);
+			else
+			{
+				/*
+				 * Reset the per-tuple context each time, to reclaim any cruft
+				 * left behind by evaluating the predicate or index expressions.
+				 */
+				ResetExprContext(econtext);
+
+				/* Set up for predicate or expression evaluation */
+				ExecStoreHeapTuple(rows[i], slot, false);
+
+				value = ExecEvalExpr(exprstate,
+									 GetPerTupleExprContext(estate),
+									 &isnull);
+			}
+
+			if (!isnull)
+				values[nvalues++] = value;
+		}
+
+		qsort_arg(values, nvalues, sizeof(Datum), range_values_cmp, typcache);
+
+		/* optimized algorithm */
+		brin_minmax_match_tuples_to_ranges(ranges,
+										   numrows, rows, nvalues, values,
+										   typcache, stats);
+
+		brin_minmax_increment_stats(minranges, maxranges, ranges->nranges,
+									values, nvalues, typcache, stats);
+	}
+
+	/* XXX cleanup / refactoring needed */
+	if (expr)
+	{
+		ExecDropSingleTupleTableSlot(slot);
+		FreeExecutorState(estate);
+	}
+
+	/*
+	 * Possibly quite large, so release explicitly and don't rely
+	 * on the memory context to discard this.
+	 */
+	pfree(minranges);
+	pfree(maxranges);
+
+cleanup:
+	/* possibly quite large, so release explicitly */
+	pfree(ranges);
+
+	/* free the BrinOpaque, just like brinendscan() would */
+	brinRevmapTerminate(opaque->bo_rmAccess);
+	brin_free_desc(opaque->bo_bdesc);
+
+	PG_RETURN_POINTER(stats);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 65750958bb2..984a7f85cda 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -16,6 +16,7 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
 #include "access/detoast.h"
 #include "access/genam.h"
 #include "access/multixact.h"
@@ -30,6 +31,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_namespace.h"
@@ -81,6 +83,7 @@ typedef struct AnlIndexData
 
 /* Default statistics target (GUC parameter) */
 int			default_statistics_target = 100;
+bool		enable_indexam_stats = false;
 
 /* A few variables that don't seem worth passing around as parameters */
 static MemoryContext anl_context = NULL;
@@ -92,7 +95,7 @@ static void do_analyze_rel(Relation onerel,
 						   AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
 						   bool inh, bool in_outer_xact, int elevel);
 static void compute_index_stats(Relation onerel, double totalrows,
-								AnlIndexData *indexdata, int nindexes,
+								AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 								HeapTuple *rows, int numrows,
 								MemoryContext col_context);
 static VacAttrStats *examine_attribute(Relation onerel, int attnum,
@@ -453,15 +456,49 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		{
 			AnlIndexData *thisdata = &indexdata[ind];
 			IndexInfo  *indexInfo;
+			bool		collectAmStats;
+			Oid			regproc;
 
 			thisdata->indexInfo = indexInfo = BuildIndexInfo(Irel[ind]);
 			thisdata->tupleFract = 1.0; /* fix later if partial */
-			if (indexInfo->ii_Expressions != NIL && va_cols == NIL)
+
+			/*
+			 * Should we collect AM-specific statistics for any of the columns?
+			 *
+			 * If AM-specific statistics are enabled (using a GUC), see if we
+			 * have an optional support procedure to build the statistics.
+			 *
+			 * If there's any such attribute, we just force building stats
+			 * even for regular index keys (not just expressions) and indexes
+			 * without predicates. It'd be good to only build the AM stats, but
+			 * for now this is good enough.
+			 *
+			 * XXX The GUC is there morestly to make it easier to enable/disable
+			 * this during development.
+			 *
+			 * FIXME Only build the AM statistics, not the other stats. And only
+			 * do that for the keys with the optional procedure. not all of them.
+			 */
+			collectAmStats = false;
+			if (enable_indexam_stats && (Irel[ind]->rd_indam->amstatsprocnum != 0))
+			{
+				for (int j = 0; j < indexInfo->ii_NumIndexAttrs; j++)
+				{
+					regproc = index_getprocid(Irel[ind], (j+1), Irel[ind]->rd_indam->amstatsprocnum);
+					if (OidIsValid(regproc))
+					{
+						collectAmStats = true;
+						break;
+					}
+				}
+			}
+
+			if ((indexInfo->ii_Expressions != NIL || collectAmStats) && va_cols == NIL)
 			{
 				ListCell   *indexpr_item = list_head(indexInfo->ii_Expressions);
 
 				thisdata->vacattrstats = (VacAttrStats **)
-					palloc(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
+					palloc0(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
 				tcnt = 0;
 				for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
 				{
@@ -482,6 +519,12 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 						if (thisdata->vacattrstats[tcnt] != NULL)
 							tcnt++;
 					}
+					else
+					{
+						thisdata->vacattrstats[tcnt] =
+							examine_attribute(Irel[ind], i + 1, NULL);
+						tcnt++;
+					}
 				}
 				thisdata->attr_cnt = tcnt;
 			}
@@ -587,7 +630,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 		if (nindexes > 0)
 			compute_index_stats(onerel, totalrows,
-								indexdata, nindexes,
+								indexdata, Irel, nindexes,
 								rows, numrows,
 								col_context);
 
@@ -821,12 +864,93 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	anl_context = NULL;
 }
 
+/*
+ * compute_indexam_stats
+ *		Call the optional procedure to compute AM-specific statistics.
+ *
+ * We simply call the procedure, which is expected to produce a bytea value.
+ *
+ * At the moment this only deals with BRIN indexes, and bails out for other
+ * access methods, but it should be generic - use something like amoptsprocnum
+ * and just check if the procedure exists.
+ */
+static void
+compute_indexam_stats(Relation onerel,
+					  Relation indexRel, IndexInfo *indexInfo,
+					  double totalrows, AnlIndexData *indexdata,
+					  HeapTuple *rows, int numrows)
+{
+	int		expridx;
+
+	if (!enable_indexam_stats)
+		return;
+
+	/* ignore index AMs without the optional procedure */
+	if (indexRel->rd_indam->amstatsprocnum == 0)
+		return;
+
+	/*
+	 * Look at attributes, and calculate stats for those that have the
+	 * optional stats proc for the opfamily.
+	 */
+	expridx = 0;
+	for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+	{
+		AttrNumber		attno = (i + 1);
+		AttrNumber		attnum = indexInfo->ii_IndexAttrNumbers[i];	/* heap attnum */
+		RegProcedure	regproc;
+		FmgrInfo	   *statsproc;
+		Datum			datum;
+		VacAttrStats   *stats;
+		MemoryContext	oldcxt;
+		Node		   *expr = NULL;
+
+		if (!AttributeNumberIsValid(attnum))
+		{
+			expr = (Node *) list_nth(RelationGetIndexExpressions(indexRel),
+									 expridx);
+			expridx++;
+		}
+
+		/* do this first, as it doesn't fail when proc not defined */
+		regproc = index_getprocid(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+
+		/* ignore opclasses without the optional procedure */
+		if (!RegProcedureIsValid(regproc))
+			continue;
+
+		statsproc = index_getprocinfo(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+		Assert(statsproc != NULL);
+
+		stats = indexdata->vacattrstats[i];
+
+		oldcxt = MemoryContextSwitchTo(stats->anl_context);
+
+		/* call the proc, let the AM calculate whatever it wants */
+		/* XXX maybe we should just pass the index attno and leave the
+		 * expression handling up to the procedure? */
+		datum = FunctionCall7Coll(statsproc,
+								  InvalidOid, /* FIXME correct collation */
+								  PointerGetDatum(onerel),
+								  PointerGetDatum(indexRel),
+								  Int16GetDatum(attno),
+								  Int16GetDatum(attnum),
+								  PointerGetDatum(expr),
+								  PointerGetDatum(rows),
+								  Int32GetDatum(numrows));
+
+		stats->staindexam = datum;
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+}
+
 /*
  * Compute statistics about indexes of a relation
  */
 static void
 compute_index_stats(Relation onerel, double totalrows,
-					AnlIndexData *indexdata, int nindexes,
+					AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 					HeapTuple *rows, int numrows,
 					MemoryContext col_context)
 {
@@ -846,6 +970,7 @@ compute_index_stats(Relation onerel, double totalrows,
 	{
 		AnlIndexData *thisdata = &indexdata[ind];
 		IndexInfo  *indexInfo = thisdata->indexInfo;
+		Relation	indexRel = indexRels[ind];
 		int			attr_cnt = thisdata->attr_cnt;
 		TupleTableSlot *slot;
 		EState	   *estate;
@@ -858,6 +983,13 @@ compute_index_stats(Relation onerel, double totalrows,
 					rowno;
 		double		totalindexrows;
 
+		/*
+		 * If this is a BRIN index, try calling a procedure to collect
+		 * extra opfamily-specific statistics (if procedure defined).
+		 */
+		compute_indexam_stats(onerel, indexRel, indexInfo, totalrows,
+							  thisdata, rows, numrows);
+
 		/* Ignore index if no columns to analyze and not partial */
 		if (attr_cnt == 0 && indexInfo->ii_Predicate == NIL)
 			continue;
@@ -1661,6 +1793,13 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+
+		/* optional AM-specific stats */
+		if (DatumGetPointer(stats->staindexam) != NULL)
+			values[Anum_pg_statistic_staindexam - 1] = stats->staindexam;
+		else
+			nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 572d9b44643..97fee77ea57 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -2370,6 +2370,8 @@ serialize_expr_stats(AnlExprData *exprdata, int nexprs)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+		nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index fe37e65af03..cc2f3ef012a 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7834,6 +7834,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	Relation	indexRel;
 	ListCell   *l;
 	VariableStatData vardata;
+	double		averageOverlaps;
 
 	Assert(rte->rtekind == RTE_RELATION);
 
@@ -7881,6 +7882,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 * correlation statistics, we will keep it as 0.
 	 */
 	*indexCorrelation = 0;
+	averageOverlaps = 0.0;
 
 	foreach(l, path->indexclauses)
 	{
@@ -7890,6 +7892,36 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		/* attempt to lookup stats in relation for this index column */
 		if (attnum != 0)
 		{
+			/*
+			 * If AM-specific statistics are enabled, try looking up the stats
+			 * for the index key. We only have this for minmax opclasses, so
+			 * we just cast it like that. But other BRIN opclasses might need
+			 * other stats so either we need to abstract this somehow, or maybe
+			 * just collect a sufficiently generic stats for all BRIN indexes.
+			 *
+			 * XXX Make this non-minmax specific.
+			 */
+			if (enable_indexam_stats)
+			{
+				BrinMinmaxStats  *amstats
+					= (BrinMinmaxStats *) get_attindexam(index->indexoid, attnum);
+
+				if (amstats)
+				{
+					elog(DEBUG1, "found AM stats: attnum %d n_ranges %lld n_summarized %lld n_all_nulls %lld n_has_nulls %lld avg_overlaps %f",
+						 attnum, (long long)amstats->n_ranges, (long long)amstats->n_summarized,
+						 (long long)amstats->n_all_nulls, (long long)amstats->n_has_nulls,
+						 amstats->avg_overlaps);
+
+					/*
+					 * The only thing we use at the moment is the average number
+					 * of overlaps for a single range. Use the other stuff too.
+					 */
+					averageOverlaps = Max(averageOverlaps,
+										  1.0 + amstats->avg_overlaps);
+				}
+			}
+
 			/* Simple variable -- look to stats for the underlying table */
 			if (get_relation_stats_hook &&
 				(*get_relation_stats_hook) (root, rte, attnum, &vardata))
@@ -7970,6 +8002,14 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 											 baserel->relid,
 											 JOIN_INNER, NULL);
 
+	/*
+	 * XXX Can we combine qualSelectivity with the average number of matching
+	 * ranges per value? qualSelectivity estimates how many tuples ar we
+	 * going to match, and average number of matches says how many ranges
+	 * will each of those match on average. We don't know how many will
+	 * be duplicate, but it gives us a worst-case estimate, at least.
+	 */
+
 	/*
 	 * Now calculate the minimum possible ranges we could match with if all of
 	 * the rows were in the perfect order in the table's heap.
@@ -7986,6 +8026,25 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	else
 		estimatedRanges = Min(minimalRanges / *indexCorrelation, indexRanges);
 
+	elog(DEBUG1, "before index AM stats: cestimatedRanges = %f", estimatedRanges);
+
+	/*
+	 * If we found some AM stats, look at average number of overlapping ranges,
+	 * and apply that to the currently estimated ranges.
+	 *
+	 * XXX We pretty much combine this with correlation info (because it was
+	 * already applied in the estimatedRanges formula above), which might be
+	 * overly pessimistic. The overlaps stats seems somewhat redundant with
+	 * the correlation, so maybe we should do just one? The AM stats seems
+	 * like a more reliable information, because the correlation is not very
+	 * sensitive to outliers, for example. So maybe let's prefer that, and
+	 * only use the correlation as fallback when AM stats are not available?
+	 */
+	if (averageOverlaps > 0.0)
+		estimatedRanges = Min(estimatedRanges * averageOverlaps, indexRanges);
+
+	elog(DEBUG1, "after index AM stats: cestimatedRanges = %f", estimatedRanges);
+
 	/* we expect to visit this portion of the table */
 	selec = estimatedRanges / indexRanges;
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c07382051d6..e41aabdeae0 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -3138,6 +3138,47 @@ get_attavgwidth(Oid relid, AttrNumber attnum)
 	return 0;
 }
 
+
+/*
+ * get_attstaindexam
+ *
+ *	  Given the table and attribute number of a column, get the index AM
+ *	  statistics.  Return NULL if no data available.
+ *
+ * Currently this is only consulted for individual tables, not for inheritance
+ * trees, so we don't need an "inh" parameter.
+ */
+bytea *
+get_attindexam(Oid relid, AttrNumber attnum)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache3(STATRELATTINH,
+						 ObjectIdGetDatum(relid),
+						 Int16GetDatum(attnum),
+						 BoolGetDatum(false));
+	if (HeapTupleIsValid(tp))
+	{
+		Datum	val;
+		bytea  *retval = NULL;
+		bool	isnull;
+
+		val = SysCacheGetAttr(STATRELATTINH, tp,
+							  Anum_pg_statistic_staindexam,
+							  &isnull);
+
+		if (!isnull)
+			retval = (bytea *) PG_DETOAST_DATUM(val);
+
+		// staindexam = ((Form_pg_statistic) GETSTRUCT(tp))->staindexam;
+		ReleaseSysCache(tp);
+
+		return retval;
+	}
+
+	return NULL;
+}
+
 /*
  * get_attstatsslot
  *
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 43b9d926600..67687d158e6 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1002,6 +1002,16 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexam_stats", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index AM stats."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_indexam_stats,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d06074b86f6..47e80ad150c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -375,6 +375,7 @@
 #enable_hashagg = on
 #enable_hashjoin = on
 #enable_incremental_sort = on
+#enable_indexam_stats = off
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 4f1f67b4d03..e3eab725ae5 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -216,6 +216,8 @@ typedef struct IndexAmRoutine
 	uint16		amsupport;
 	/* opclass options support function number or 0 */
 	uint16		amoptsprocnum;
+	/* opclass statistics support function number or 0 */
+	uint16		amstatsprocnum;
 	/* does AM support ORDER BY indexed column's value? */
 	bool		amcanorder;
 	/* does AM support ORDER BY result of an operator on indexed column? */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index ed66f1b3d51..1d21b816fcd 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,6 +34,69 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinMinmaxStats
+{
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	int64		n_ranges;
+	int64		n_summarized;
+	int64		n_all_nulls;
+	int64		n_has_nulls;
+
+	/* average number of overlapping ranges */
+	double		avg_overlaps;
+
+	/* average number of matching ranges (per value) */
+	double		avg_matches;
+	double		avg_matches_unique;
+
+	/* minval/maxval stats (ndistinct, correlation to blkno) */
+	int64		minval_ndistinct;
+	int64		maxval_ndistinct;
+	double		minval_correlation;
+	double		maxval_correlation;
+
+	/* minval/maxval increment stats */
+	double		minval_increment_avg;
+	double		minval_increment_max;
+	double		maxval_increment_avg;
+	double		maxval_increment_max;
+
+} BrinMinmaxStats;
 
 #define BRIN_DEFAULT_PAGES_PER_RANGE	128
 #define BrinGetPagesPerRange(relation) \
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index 97ddc925b27..eac796e6f47 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -75,6 +75,7 @@ typedef struct BrinDesc
 #define BRIN_PROCNUM_OPTIONS 		5	/* optional */
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
+#define BRIN_PROCNUM_STATISTICS		11	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 5b950129de0..9bbd1f14f12 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -804,6 +804,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -835,6 +837,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -864,6 +868,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -893,6 +899,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -905,6 +913,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -917,6 +927,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1034,6 +1046,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1062,6 +1076,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1110,6 +1126,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1160,6 +1178,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1173,6 +1194,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1261,6 +1285,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1314,6 +1341,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1366,6 +1396,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1436,6 +1468,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1467,6 +1502,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1517,6 +1554,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1530,6 +1570,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1542,6 +1585,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1668,6 +1713,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1721,6 +1769,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1771,6 +1822,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1785,6 +1838,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1799,6 +1855,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1851,6 +1910,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1924,6 +1985,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 66b73c3900d..c44784a0d07 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8496,6 +8496,10 @@
 { oid => '3386', descr => 'BRIN minmax support',
   proname => 'brin_minmax_union', prorettype => 'bool',
   proargtypes => 'internal internal internal', prosrc => 'brin_minmax_union' },
+{ oid => '9800', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_stats', prorettype => 'bool',
+  proargtypes => 'internal internal int2 int2 internal int4',
+  prosrc => 'brin_minmax_stats' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 8770c5b4c60..d3d0bce257a 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -121,6 +121,11 @@ CATALOG(pg_statistic,2619,StatisticRelationId)
 	anyarray	stavalues3;
 	anyarray	stavalues4;
 	anyarray	stavalues5;
+
+	/*
+	 * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).
+	 */
+	bytea		staindexam;
 #endif
 } FormData_pg_statistic;
 
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 689dbb77024..dba411cacf7 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -155,6 +155,7 @@ typedef struct VacAttrStats
 	float4	   *stanumbers[STATISTIC_NUM_SLOTS];
 	int			numvalues[STATISTIC_NUM_SLOTS];
 	Datum	   *stavalues[STATISTIC_NUM_SLOTS];
+	Datum		staindexam;		/* index-specific stats (as bytea) */
 
 	/*
 	 * These fields describe the stavalues[n] element types. They will be
@@ -299,6 +300,7 @@ extern PGDLLIMPORT int vacuum_multixact_freeze_min_age;
 extern PGDLLIMPORT int vacuum_multixact_freeze_table_age;
 extern PGDLLIMPORT int vacuum_failsafe_age;
 extern PGDLLIMPORT int vacuum_multixact_failsafe_age;
+extern PGDLLIMPORT bool enable_indexam_stats;
 
 /* Variables for cost-based parallel vacuum */
 extern PGDLLIMPORT pg_atomic_uint32 *VacuumSharedCostBalance;
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b9728..fcef91d306d 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -185,6 +185,7 @@ extern Oid	getBaseType(Oid typid);
 extern Oid	getBaseTypeAndTypmod(Oid typid, int32 *typmod);
 extern int32 get_typavgwidth(Oid typid, int32 typmod);
 extern int32 get_attavgwidth(Oid relid, AttrNumber attnum);
+extern bytea *get_attindexam(Oid relid, AttrNumber attnum);
 extern bool get_attstatsslot(AttStatsSlot *sslot, HeapTuple statstuple,
 							 int reqkind, Oid reqop, int flags);
 extern void free_attstatsslot(AttStatsSlot *sslot);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 001c6e7eb9d..b7fda6fc828 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -117,6 +117,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashagg                 | on
  enable_hashjoin                | on
  enable_incremental_sort        | on
+ enable_indexam_stats           | off
  enable_indexonlyscan           | on
  enable_indexscan               | on
  enable_material                | on
@@ -132,7 +133,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(21 rows)
+(22 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.39.1

0002-wip-introduce-debug_brin_stats-20230218-2.patchtext/x-patch; charset=UTF-8; name=0002-wip-introduce-debug_brin_stats-20230218-2.patchDownload

From 081ba6ec9034deff07b0b9f32fc467614aa15e23 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 17:46:51 +0200
Subject: [PATCH 02/11] wip: introduce debug_brin_stats

---
 src/backend/access/brin/brin_minmax.c | 80 +++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c   | 18 ++++++
 2 files changed, 98 insertions(+)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 6b0f303504d..b7106ba6ba9 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -28,6 +28,10 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 
+#ifdef DEBUG_BRIN_STATS
+bool debug_brin_stats = false;
+#endif
+
 typedef struct MinmaxOpaque
 {
 	Oid			cached_subtype;
@@ -493,6 +497,13 @@ brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
 {
 	int64	noverlaps;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	noverlaps = 0;
 	for (int i = 0; i < nranges; i++)
 	{
@@ -515,6 +526,16 @@ brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
 	 */
 	noverlaps *= 2;
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->avg_overlaps = (double) noverlaps / nranges;
 }
 
@@ -540,6 +561,13 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 
 	int64  *unique = (int64 *) palloc0(sizeof(int64) * nvalues);
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	/*
 	 * Build running count of unique values. We know there are unique[i]
 	 * unique values in values array up to index "i".
@@ -588,6 +616,18 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 	Assert(nmatches >= 0);
 	Assert(nmatches_unique >= 0);
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f", nmatches_unique, nvalues_unique,
+			(double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->avg_matches = (double) nmatches / numrows;
 	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
 }
@@ -619,6 +659,13 @@ brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
 			minval_corr = 0,
 			maxval_corr = 0;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	for (int i = 1; i < nranges; i++)
 	{
 		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
@@ -641,6 +688,19 @@ brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
 
 	stats->minval_correlation = (double) minval_corr / nranges;
 	stats->maxval_correlation = (double) maxval_corr / nranges;
+
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_value_stats -----");
+		elog(WARNING, "minval ndistinct " INT64_FORMAT " correlation %f",
+			 stats->minval_ndistinct, stats->minval_correlation);
+		elog(WARNING, "maxval ndistinct " INT64_FORMAT " correlation %f",
+			 stats->maxval_ndistinct, stats->maxval_correlation);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
 }
 
 /*
@@ -665,6 +725,13 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 			max_minval = 0,
 			max_maxval = 0;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	for (int i = 1; i < nranges; i++)
 	{
 		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
@@ -716,6 +783,19 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 		}
 	}
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_increment_stats -----");
+		elog(WARNING, "minval ndistinct %ld sum %f max %f avg %f",
+			 minval_ndist, sum_minval, max_minval, sum_minval / minval_ndist);
+		elog(WARNING, "maxval ndistinct %ld sum %f max %f avg %f",
+			 maxval_ndist, sum_maxval, max_maxval, sum_maxval / maxval_ndist);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->minval_increment_avg = (sum_minval / minval_ndist);
 	stats->minval_increment_max = max_minval;
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 67687d158e6..f8d06296fb1 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -96,6 +96,10 @@ extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
 extern bool synchronize_seqscans;
 
+#ifdef DEBUG_BRIN_STATS
+extern bool debug_brin_stats;
+#endif
+
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
 #endif
@@ -1230,6 +1234,20 @@ struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+#ifdef DEBUG_BRIN_STATS
+	/* this is undocumented because not exposed in a standard build */
+	{
+		{"debug_brin_stats", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Print info about calculated BRIN statistics."),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&debug_brin_stats,
+		false,
+		NULL, NULL, NULL
+	},
+#endif
+
 	{
 		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Terminate session on any error."),
-- 
2.39.1

0003-wip-introduce-debug_brin_cross_check-20230218-2.patchtext/x-patch; charset=UTF-8; name=0003-wip-introduce-debug_brin_cross_check-20230218-2.patchDownload

From b3d882eee16a0ab29fd54c5e47aec442c7a0f02d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 20:47:31 +0200
Subject: [PATCH 03/11] wip: introduce debug_brin_cross_check

---
 src/backend/access/brin/brin_minmax.c | 574 ++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c   |  10 +
 2 files changed, 584 insertions(+)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index b7106ba6ba9..ed138f1f89d 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -30,6 +30,7 @@
 
 #ifdef DEBUG_BRIN_STATS
 bool debug_brin_stats = false;
+bool debug_brin_cross_check = false;
 #endif
 
 typedef struct MinmaxOpaque
@@ -340,6 +341,50 @@ range_values_cmp(const void *a, const void *b, void *arg)
 	return DatumGetInt32(c);
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * maxval_start
+ *		Determine first index so that (maxvalue >= value).
+ *
+ * The array of ranges is expected to be sorted by maxvalue, so this is the first
+ * range that can possibly intersect with range having "value" as minval.
+ */
+static int
+maxval_start(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[start]->max_value, typcache) <= 0)
+		return 0;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[end]->max_value, typcache) > 0)
+		return nranges;
+
+	while ((end - start) > 0)
+	{
+		int	 midpoint;
+		int	 r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->max_value, typcache);
+
+		if (r <= 0)
+			end = midpoint;
+		else
+			start = (midpoint + 1);
+	}
+
+	Assert(ranges[start]->max_value >= value);
+	Assert(ranges[start-1]->max_value < value);
+
+	return start;
+}
+#endif
+
 /*
  * minval_end
  *		Determine first index so that (minval > value).
@@ -632,6 +677,316 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * Simple histogram, with bins tracking value and two overlap counts.
+ *
+ * XXX Maybe we should have two separate histograms, one for all counts and
+ * another one for "unique" values.
+ *
+ * XXX Serialize the histogram. There might be a data set where we have very
+ * many distinct buckets (values having very different number of matching
+ * ranges) - not sure if there's some sort of upper limit (but hard to say for
+ * other opclasses, like bloom). And we don't want arbitrarily large histogram,
+ * to keep the statistics fairly small, I guess. So we'd need to pick a subset,
+ * merge buckets with "similar" counts, or approximate it somehow. For now we
+ * don't serialize it, because we don't use the histogram.
+ */
+typedef struct histogram_bin_t
+{
+	int64	value;
+	int64	count;
+} histogram_bin_t;
+
+typedef struct histogram_t
+{
+	int				nbins;
+	int				nbins_max;
+	histogram_bin_t	bins[FLEXIBLE_ARRAY_MEMBER];
+} histogram_t;
+
+#define HISTOGRAM_BINS_START 32
+
+/* allocate histogram with default number of bins */
+static histogram_t *
+histogram_init(void)
+{
+	histogram_t *hist;
+
+	hist = (histogram_t *) palloc0(offsetof(histogram_t, bins) +
+								   sizeof(histogram_bin_t) * HISTOGRAM_BINS_START);
+	hist->nbins_max = HISTOGRAM_BINS_START;
+
+	return hist;
+}
+
+/*
+ * histogram_add
+ *			 Add a hit for a particular value to the histogram.
+ *
+ * XXX We don't sort the bins, so just do binary sort. For large number of values
+ * this might be an issue, for small number of values a linear search is fine.
+ */
+static histogram_t *
+histogram_add(histogram_t *hist, int value)
+{
+	bool	found = false;
+	histogram_bin_t *bin;
+
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		if (hist->bins[i].value == value)
+		{
+			bin = &hist->bins[i];
+			found = true;
+		}
+	}
+
+	if (!found)
+	{
+		if (hist->nbins == hist->nbins_max)
+		{
+			int		nbins = (2 * hist->nbins_max);
+
+			hist = repalloc(hist, offsetof(histogram_t, bins) +
+							sizeof(histogram_bin_t) * nbins);
+			hist->nbins_max = nbins;
+		}
+
+		Assert(hist->nbins < hist->nbins_max);
+
+		bin = &hist->bins[hist->nbins++];
+		bin->value = value;
+		bin->count = 0;
+	}
+
+	bin->count += 1;
+
+	Assert(bin->value == value);
+	Assert(bin->count >= 0);
+
+	return hist;
+}
+
+/* used to sort histogram bins by value */
+static int
+histogram_bin_cmp(const void *a, const void *b)
+{
+	histogram_bin_t *ba = (histogram_bin_t *) a;
+	histogram_bin_t *bb = (histogram_bin_t *) b;
+
+	if (ba->value < bb->value)
+		return -1;
+
+	if (bb->value < ba->value)
+		return 1;
+
+	return 0;
+}
+
+static void
+histogram_print(histogram_t *hist)
+{
+	return;
+
+	elog(WARNING, "----- histogram -----");
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		elog(WARNING, "bin %d value %ld count %ld",
+				i, hist->bins[i].value, hist->bins[i].count);
+	}
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges2
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Match sample tuples to the ranges, so that we can count how many ranges
+ * a value matches on average. This might seem redundant to the number of
+ * overlaps, because the value is ~avg_overlaps/2.
+ *
+ * Imagine ranges arranged in "shifted" uniformly by 1/overlaps, e.g. with 3
+ * overlaps [0,100], [33,133], [66, 166] and so on. A random value will hit
+ * only half of there ranges, thus 1/2. This can be extended to randomly
+ * overlapping ranges.
+ *
+ * However, we may not be able to count overlaps for some opclasses (e.g. for
+ * bloom ranges), in which case we have at least this.
+ *
+ * This simply walks the values, and determines matching ranges by looking
+ * for lower/upper bound in ranges ordered by minval/maxval.
+ *
+ * XXX The other question is what to do about duplicate values. If we have a
+ * very frequent value in the sample, it's likely in many places/ranges. Which
+ * will skew the average, because it'll be added repeatedly. So we also count
+ * avg_ranges for unique values.
+ *
+ * XXX The relationship that (average_matches ~ average_overlaps/2) only
+ * works for minmax opclass, and can't be extended to minmax-multi. The
+ * overlaps can only consider the two extreme values (essentially treating
+ * the summary as a single minmax range), because that's what brinsort
+ * needs. But the minmax-multi range may have "gaps" (kinda the whole point
+ * of these opclasses), which affects matching tuples to ranges.
+ *
+ * XXX This also builds histograms of the number of matches, both for the
+ * raw and unique values. At the moment we don't do anything with the
+ * results, though (except for printing those).
+ */
+static void
+brin_minmax_match_tuples_to_ranges2(BrinRanges *ranges,
+									BrinRange **minranges, BrinRange **maxranges,
+									int numrows, HeapTuple *rows,
+									int nvalues, Datum *values,
+									TypeCacheEntry *typcache,
+									BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nmatches_value = 0;
+	int64	nvalues_unique = 0;
+
+	histogram_t	   *hist = histogram_init();
+	histogram_t	   *hist_unique = histogram_init();
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		int		start;
+		int		end;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Same value as preceding, so just use the preceding count.
+		 * We don't increment the unique counters, because this is
+		 * a duplicate.
+		 */
+		if ((i > 0) && (range_values_cmp(&values[i-1], &values[i], typcache) == 0))
+		{
+			nmatches += nmatches_value;
+			hist = histogram_add(hist, nmatches_value);
+			continue;
+		}
+
+		nmatches_value = 0;
+
+		start = maxval_start(maxranges, ranges->nranges, values[i], typcache);
+		end = minval_end(minranges, ranges->nranges, values[i], typcache);
+
+		for (int j = start; j < ranges->nranges; j++)
+		{
+			if (maxranges[j]->min_index >= end)
+				continue;
+
+			if (maxranges[j]->min_index_lowest >= end)
+				break;
+
+			nmatches_value++;
+		}
+
+		hist = histogram_add(hist, nmatches_value);
+		hist_unique = histogram_add(hist_unique, nmatches_value);
+
+		nmatches += nmatches_value;
+		nmatches_unique += nmatches_value;
+		nvalues_unique++;
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges2 -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f",
+			 nmatches_unique, nvalues_unique, (double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_matches != (double) nmatches / numrows)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges2: avg_matches mismatch %f != %f",
+			 stats->avg_matches, (double) nmatches / numrows);
+
+	if (stats->avg_matches_unique != (double) nmatches_unique / nvalues_unique)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges2: avg_matches_unique mismatch %f != %f",
+			 stats->avg_matches_unique, (double) nmatches_unique / nvalues_unique);
+
+	pg_qsort(hist->bins, hist->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+	pg_qsort(hist_unique->bins, hist_unique->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+
+	histogram_print(hist);
+	histogram_print(hist_unique);
+
+	pfree(hist);
+	pfree(hist_unique);
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges_bruteforce
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Bruteforce approach, used mostly for cross-checking.
+ */
+static void
+brin_minmax_match_tuples_to_ranges_bruteforce(BrinRanges *ranges,
+											  int numrows, HeapTuple *rows,
+											  int nvalues, Datum *values,
+											  TypeCacheEntry *typcache,
+											  BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nvalues_unique = 0;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		bool	is_unique;
+		int64	nmatches_value = 0;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* is this a new value? */
+		is_unique = ((i == 0) || (range_values_cmp(&values[i-1], &values[i], typcache) != 0));
+
+		/* count unique values */
+		nvalues_unique += (is_unique) ? 1 : 0;
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			if (range_values_cmp(&values[i], &ranges->ranges[j].min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&values[i], &ranges->ranges[j].max_value, typcache) > 0)
+				continue;
+
+			nmatches_value++;
+		}
+
+		nmatches += nmatches_value;
+		nmatches_unique += (is_unique) ? nmatches_value : 0;
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges_bruteforce -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f", nmatches_unique, nvalues_unique,
+			 (double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_matches != (double) nmatches / numrows)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges_bruteforce: avg_matches mismatch %f != %f",
+			 stats->avg_matches, (double) nmatches / numrows);
+
+	if (stats->avg_matches_unique != (double) nmatches_unique / nvalues_unique)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges_bruteforce: avg_matches_unique mismatch %f != %f",
+			 stats->avg_matches_unique, (double) nmatches_unique / nvalues_unique);
+}
+#endif
+
 /*
  * brin_minmax_value_stats
  *		Calculate statistics about minval/maxval values.
@@ -803,6 +1158,198 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 	stats->maxval_increment_max = max_maxval;
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * brin_minmax_count_overlaps2
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges/maxranges to quickly eliminate ranges that can't
+ * possibly intersect.
+ *
+ * XXX Seems rather complicated and works poorly for wide ranges (with outlier
+ * values), brin_minmax_count_overlaps is likely better.
+ */
+static void
+brin_minmax_count_overlaps2(BrinRanges *ranges,
+						   BrinRange **minranges, BrinRange **maxranges,
+						   TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	int64			noverlaps;
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Walk the ranges ordered by max_values, see how many ranges overlap.
+	 *
+	 * Once we get to a state where (min_value > current.max_value) for
+	 * all future ranges, we know none of them can overlap and we can
+	 * terminate. This is what min_index_lowest is for.
+	 *
+	 * XXX If there are very wide ranges (with outlier min/max values),
+	 * the min_index_lowest is going to be pretty useless, because the
+	 * range will be sorted at the very end by max_value, but will have
+	 * very low min_index, so this won't work.
+	 *
+	 * XXX We could collect a more elaborate stuff, like for example a
+	 * histogram of number of overlaps, or maximum number of overlaps.
+	 * So we'd have average, but then also an info if there are some
+	 * ranges with very many overlaps.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int			idx = (i + 1);
+		BrinRange *ra = maxranges[i];
+		uint64		min_index = ra->min_index;
+
+		CHECK_FOR_INTERRUPTS();
+
+#ifdef NOT_USED
+		/*
+		 * XXX Not needed, we can just count "future" ranges and then
+		 * we just multiply by 2.
+		 */
+
+		/*
+		 * What's the first range that might overlap with this one?
+		 * needs to have maxval > current.minval.
+		 */
+		while (idx > 0)
+		{
+			BrinRange *rb = maxranges[idx - 1];
+
+			/* the range is before the current one, so can't intersect */
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				break;
+
+			idx--;
+		}
+#endif
+
+		/*
+		 * Find the first min_index that is higher than the max_value,
+		 * so that we can compare that instead of the values in the
+		 * next loop. There should be fewer value comparisons than in
+		 * the next loop, so we'll save on function calls.
+		 */
+		while (min_index < ranges->nranges)
+		{
+			if (range_values_cmp(&minranges[min_index]->min_value,
+								 &ra->max_value, typcache) > 0)
+				break;
+
+			min_index++;
+		}
+
+		/*
+		 * Walk the following ranges (ordered by max_value), and check
+		 * if it overlaps. If it matches, we look at the next one. If
+		 * not, we check if there can be more ranges.
+		 */
+		for (int j = idx; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = maxranges[j];
+
+			/* the range overlaps - just continue with the next one */
+			// if (range_values_cmp(&rb->min_value, &ra->max_value, typcache) <= 0)
+			if (rb->min_index < min_index)
+			{
+				noverlaps++;
+				continue;
+			}
+
+			/*
+			 * Are there any future ranges that might overlap? We can
+			 * check the min_index_lowest to decide quickly.
+			 */
+			 if (rb->min_index_lowest >= min_index)
+					break;
+		}
+	}
+
+	/*
+	 * We only count intersect for "following" ranges when ordered by maxval,
+	 * so we only see 1/2 the overlaps. So double the result.
+	 */
+	noverlaps *= 2;
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps2 -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_overlaps != (double) noverlaps / ranges->nranges)
+		elog(ERROR, "brin_minmax_count_overlaps2: mismatch %f != %f",
+			 stats->avg_overlaps, (double) noverlaps / ranges->nranges);
+}
+
+/*
+ * brin_minmax_count_overlaps_bruteforce
+ *		Calculate number of overlaps by brute force.
+ *
+ * Actually compares every range to every other range. Quite expensive, used
+ * primarily to cross-check the other algorithms.
+ */
+static void
+brin_minmax_count_overlaps_bruteforce(BrinRanges *ranges,
+									  TypeCacheEntry *typcache,
+									  BrinMinmaxStats *stats)
+{
+	int64			noverlaps;
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Brute force calculation of overlapping ranges, comparing each
+	 * range to every other range - bound to be pretty expensive, as
+	 * it's pretty much O(N^2). Kept mostly for easy cross-check with
+	 * the preceding "optimized" code.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		BrinRange *ra = &ranges->ranges[i];
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = &ranges->ranges[j];
+
+			CHECK_FOR_INTERRUPTS();
+
+			if (i == j)
+				continue;
+
+			if (range_values_cmp(&ra->max_value, &rb->min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				continue;
+
+#if 0
+			elog(DEBUG1, "[%ld,%ld] overlaps [%ld,%ld]",
+				 ra->min_value, ra->max_value,
+				 rb->min_value, rb->max_value);
+#endif
+
+			noverlaps++;
+		}
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps_bruteforce -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_overlaps != (double) noverlaps / ranges->nranges)
+		elog(ERROR, "brin_minmax_count_overlaps2: mismatch %f != %f",
+			 stats->avg_overlaps, (double) noverlaps / ranges->nranges);
+}
+#endif
+
 /*
  * brin_minmax_stats
  *		Calculate custom statistics for a BRIN minmax index.
@@ -814,6 +1361,11 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
  *  - average number of rows matching a range
  *  - number of distinct minval/maxval values
  *
+ * There are multiple ways to calculate some of the metrics, so to allow
+ * cross-checking during development it's possible to run and compare all.
+ * To do that, define STATS_CROSS_CHECK. There's also STATS_DEBUG define
+ * that simply prints the calculated results.
+ *
  * XXX This could also calculate correlation of the range minval, so that
  * we can estimate how much random I/O will happen during the BrinSort.
  * And perhaps we should also sort the ranges by (minval,block_start) to
@@ -1135,6 +1687,14 @@ brin_minmax_stats(PG_FUNCTION_ARGS)
 	/* calculate average number of overlapping ranges for any range */
 	brin_minmax_count_overlaps(minranges, ranges->nranges, typcache, stats);
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_cross_check)
+	{
+		brin_minmax_count_overlaps2(ranges, minranges, maxranges, typcache, stats);
+		brin_minmax_count_overlaps_bruteforce(ranges, typcache, stats);
+	}
+#endif
+
 	/* calculate minval/maxval stats (distinct values and correlation) */
 	brin_minmax_value_stats(minranges, maxranges,
 							ranges->nranges, typcache, stats);
@@ -1200,6 +1760,20 @@ brin_minmax_stats(PG_FUNCTION_ARGS)
 										   numrows, rows, nvalues, values,
 										   typcache, stats);
 
+#ifdef DEBUG_BRIN_STATS
+		if (debug_brin_cross_check)
+		{
+			brin_minmax_match_tuples_to_ranges2(ranges, minranges, maxranges,
+												numrows, rows, nvalues, values,
+												typcache, stats);
+
+			brin_minmax_match_tuples_to_ranges_bruteforce(ranges,
+														  numrows, rows,
+														  nvalues, values,
+														  typcache, stats);
+		}
+#endif
+
 		brin_minmax_increment_stats(minranges, maxranges, ranges->nranges,
 									values, nvalues, typcache, stats);
 	}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f8d06296fb1..1d576343ecd 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -98,6 +98,7 @@ extern bool synchronize_seqscans;
 
 #ifdef DEBUG_BRIN_STATS
 extern bool debug_brin_stats;
+extern bool debug_brin_cross_check;
 #endif
 
 #ifdef TRACE_SYNCSCAN
@@ -1246,6 +1247,15 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"debug_brin_cross_check", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Cross-check calculation of BRIN statistics."),
+			NULL
+		},
+		&debug_brin_cross_check,
+		false,
+		NULL, NULL, NULL
+	},
 #endif
 
 	{
-- 
2.39.1

0004-Allow-BRIN-indexes-to-produce-sorted-outp-20230218-2.patchtext/x-patch; charset=UTF-8; name=0004-Allow-BRIN-indexes-to-produce-sorted-outp-20230218-2.patchDownload

From 2a920a3ed0eba78f0a4f7fb903eeaf3432ed725a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 9 Oct 2022 11:33:37 +0200
Subject: [PATCH 04/11] Allow BRIN indexes to produce sorted output

Some BRIN indexes can be used to produce sorted output, by using the
range information to sort tuples incrementally. This is particularly
interesting for LIMIT queries, which only need to scan the first few
rows, and alternative plans (e.g. Seq Scan + Sort) have a very high
startup cost.

Of course, if there are e.g. BTREE indexes this is going to be slower,
but people are unlikely to have both index types on the same column.

This is disabled by default, use enable_brinsort GUC to enable it.
---
 src/backend/access/brin/brin_minmax.c         |  402 ++++
 src/backend/commands/explain.c                |   44 +
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/meson.build              |    1 +
 src/backend/executor/nodeBrinSort.c           | 1612 +++++++++++++++++
 src/backend/optimizer/path/costsize.c         |  254 +++
 src/backend/optimizer/path/indxpath.c         |  183 ++
 src/backend/optimizer/path/pathkeys.c         |   49 +
 src/backend/optimizer/plan/createplan.c       |  260 +++
 src/backend/optimizer/plan/setrefs.c          |   19 +
 src/backend/optimizer/util/pathnode.c         |   57 +
 src/backend/utils/misc/guc_tables.c           |   28 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/backend/utils/sort/tuplesort.c            |   12 +
 src/include/access/brin.h                     |   35 -
 src/include/access/brin_internal.h            |    1 +
 src/include/catalog/pg_amproc.dat             |   64 +
 src/include/catalog/pg_proc.dat               |    5 +-
 src/include/executor/nodeBrinSort.h           |   47 +
 src/include/nodes/execnodes.h                 |  108 ++
 src/include/nodes/pathnodes.h                 |   11 +
 src/include/nodes/plannodes.h                 |   32 +
 src/include/optimizer/cost.h                  |    3 +
 src/include/optimizer/pathnode.h              |    9 +
 src/include/optimizer/paths.h                 |    3 +
 src/include/utils/tuplesort.h                 |    1 +
 src/test/regress/expected/sysviews.out        |    3 +-
 28 files changed, 3218 insertions(+), 37 deletions(-)
 create mode 100644 src/backend/executor/nodeBrinSort.c
 create mode 100644 src/include/executor/nodeBrinSort.h

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index ed138f1f89d..1181c9e7388 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -16,6 +16,10 @@
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
 #include "executor/executor.h"
@@ -43,6 +47,9 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
 
+/* print info about ranges */
+#define BRINSORT_DEBUG
+
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1803,6 +1810,401 @@ cleanup:
 	PG_RETURN_POINTER(stats);
 }
 
+/*
+ * brin_minmax_range_tupdesc
+ *		Create a tuple descriptor to store BrinRange data.
+ */
+static TupleDesc
+brin_minmax_range_tupdesc(BrinDesc *brdesc, AttrNumber attnum)
+{
+	TupleDesc	tupdesc;
+	AttrNumber	attno = 1;
+
+	/* expect minimum and maximum */
+	Assert(brdesc->bd_info[attnum - 1]->oi_nstored == 2);
+
+	tupdesc = CreateTemplateTupleDesc(7);
+
+	/* blkno_start */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT4OID, -1, 0);
+
+	/* blkno_end (could be calculated as blkno_start + pages_per_range) */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT4OID, -1, 0);
+
+	/* has_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* all_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* not_summarized */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* min_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	/* max_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	return tupdesc;
+}
+
+/*
+ * brin_minmax_scan_init
+ *		Prepare the BrinRangeScanDesc including the sorting info etc.
+ *
+ * We want to have the ranges in roughly this order
+ *
+ * - not-summarized
+ * - summarized, non-null values
+ * - summarized, all-nulls
+ *
+ * We do it this way, because the not-summarized ranges need to be
+ * scanned always (both to produce NULL and non-NULL values), and
+ * we need to read all of them into the tuplesort before producing
+ * anything. So placing them at the beginning is reasonable.
+ *
+ * The all-nulls ranges are placed last, because when processing
+ * NULLs we need to scan everything anyway (some of the ranges might
+ * have has_nulls=true). But for non-NULL values we can abort once
+ * we hit the first all-nulls range.
+ *
+ * The regular ranges are sorted by blkno_start, to make it maybe
+ * a bit more sequential (but this only helps if there are ranges
+ * with the same minval).
+ */
+static BrinRangeScanDesc *
+brin_minmax_scan_init(BrinDesc *bdesc, Oid collation, AttrNumber attnum, bool asc)
+{
+	BrinRangeScanDesc  *scan;
+
+	/* sort by (not_summarized, minval, blkno_start, all_nulls) */
+	AttrNumber			keys[4];
+	Oid					collations[4];
+	bool				nullsFirst[4];
+	Oid					operators[4];
+	Oid					typid;
+	TypeCacheEntry	   *typcache;
+
+	/* we expect to have min/max value for each range, same type for both */
+	Assert(bdesc->bd_info[attnum - 1]->oi_nstored == 2);
+	Assert(bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id ==
+		   bdesc->bd_info[attnum - 1]->oi_typcache[1]->type_id);
+
+	scan = (BrinRangeScanDesc *) palloc0(sizeof(BrinRangeScanDesc));
+
+	/* build tuple descriptor for range data */
+	scan->tdesc = brin_minmax_range_tupdesc(bdesc, attnum);
+
+	/* initialize ordering info */
+	keys[0] = 5;				/* not_summarized */
+	keys[1] = 4;				/* all_nulls */
+	keys[2] = (asc) ? 6 : 7;	/* min_value (asc) or max_value (desc) */
+	keys[3] = 1;				/* blkno_start */
+
+	collations[0] = InvalidOid;	/* FIXME */
+	collations[1] = InvalidOid;	/* FIXME */
+	collations[2] = collation;	/* FIXME */
+	collations[3] = InvalidOid;	/* FIXME */
+
+	/* unrelated to the ordering desired by the user */
+	nullsFirst[0] = false;
+	nullsFirst[1] = false;
+	nullsFirst[2] = false;
+	nullsFirst[3] = false;
+
+	/* lookup sort operator for the boolean type (used for not_summarized) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_GT_OPR);
+	operators[0] = typcache->gt_opr;
+
+	/* lookup sort operator for the boolean type (used for all_nulls) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_LT_OPR);
+	operators[1] = typcache->lt_opr;
+
+	/* lookup sort operator for the min/max type */
+	typid = bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id;
+	typcache = lookup_type_cache(typid, TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+	operators[2] = (asc) ? typcache->lt_opr : typcache->gt_opr;
+
+	/* lookup sort operator for the bigint type (used for blkno_start) */
+	typcache = lookup_type_cache(INT4OID, TYPECACHE_LT_OPR);
+	operators[3] = typcache->lt_opr;
+
+	/*
+	 * XXX better to keep this small enough to fit into L2/L3, large values
+	 * of work_mem may easily make this slower.
+	 */
+	scan->ranges = tuplesort_begin_heap(scan->tdesc,
+										4, /* nkeys */
+										keys,
+										operators,
+										collations,
+										nullsFirst,
+										work_mem,
+										NULL,
+										TUPLESORT_RANDOMACCESS);
+
+	scan->slot = MakeSingleTupleTableSlot(scan->tdesc,
+										  &TTSOpsMinimalTuple);
+
+	return scan;
+}
+
+/*
+ * brin_minmax_scan_add_tuple
+ *		Form and store a tuple representing the BRIN range to the tuplestore.
+ */
+static void
+brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan, TupleTableSlot *slot,
+						   BlockNumber block_start, BlockNumber block_end,
+						   bool has_nulls, bool all_nulls, bool not_summarized,
+						   Datum min_value, Datum max_value)
+{
+	ExecClearTuple(slot);
+
+	memset(slot->tts_isnull, false, 7 * sizeof(bool));
+
+	slot->tts_values[0] = UInt32GetDatum(block_start);
+	slot->tts_values[1] = UInt32GetDatum(block_end);
+	slot->tts_values[2] = BoolGetDatum(has_nulls);
+	slot->tts_values[3] = BoolGetDatum(all_nulls);
+	slot->tts_values[4] = BoolGetDatum(not_summarized);
+	slot->tts_values[5] = min_value;
+	slot->tts_values[6] = max_value;
+
+	if (all_nulls || not_summarized)
+	{
+		slot->tts_isnull[5] = true;
+		slot->tts_isnull[6] = true;
+	}
+
+	ExecStoreVirtualTuple(slot);
+
+	tuplesort_puttupleslot(scan->ranges, slot);
+
+	scan->nranges++;
+}
+
+#ifdef BRINSORT_DEBUG
+/*
+ * brin_minmax_scan_next
+ *		Return the next BRIN range information from the tuplestore.
+ *
+ * Returns NULL when there are no more ranges.
+ */
+static BrinRange *
+brin_minmax_scan_next(BrinRangeScanDesc *scan)
+{
+	if (tuplesort_gettupleslot(scan->ranges, true, false, scan->slot, NULL))
+	{
+		bool		isnull;
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = slot_getattr(scan->slot, 1, &isnull);
+		range->blkno_end = slot_getattr(scan->slot, 2, &isnull);
+		range->has_nulls = slot_getattr(scan->slot, 3, &isnull);
+		range->all_nulls = slot_getattr(scan->slot, 4, &isnull);
+		range->not_summarized = slot_getattr(scan->slot, 5, &isnull);
+		range->min_value = slot_getattr(scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(scan->slot, 7, &isnull);
+
+		return range;
+	}
+
+	return NULL;
+}
+
+/*
+ * brin_minmax_scan_dump
+ *		Print info about all page ranges stored in the tuplestore.
+ */
+static void
+brin_minmax_scan_dump(BrinRangeScanDesc *scan)
+{
+	BrinRange *range;
+
+	if (!message_level_is_interesting(WARNING))
+		return;
+
+	elog(WARNING, "===== dumping =====");
+	while ((range = brin_minmax_scan_next(scan)) != NULL)
+	{
+		elog(WARNING, "[%u %u] has_nulls %d all_nulls %d not_summarized %d values [%ld %ld]",
+			 range->blkno_start, range->blkno_end,
+			 range->has_nulls, range->all_nulls, range->not_summarized,
+			 range->min_value, range->max_value);
+
+		pfree(range);
+	}
+
+	/* reset the tuplestore, so that we can start scanning again */
+	tuplesort_rescan(scan->ranges);
+}
+#endif
+
+static void
+brin_minmax_scan_finalize(BrinRangeScanDesc *scan)
+{
+	tuplesort_performsort(scan->ranges);
+}
+
+/*
+ * brin_minmax_ranges
+ *		Load the BRIN ranges and sort them.
+ */
+Datum
+brin_minmax_ranges(PG_FUNCTION_ARGS)
+{
+	IndexScanDesc	scan = (IndexScanDesc) PG_GETARG_POINTER(0);
+	AttrNumber		attnum = PG_GETARG_INT16(1);
+	bool			asc = PG_GETARG_BOOL(2);
+	Oid				colloid = PG_GET_COLLATION();
+	BrinOpaque *opaque;
+	Relation	indexRel;
+	Relation	heapRel;
+	BlockNumber nblocks;
+	BlockNumber	heapBlk;
+	Oid			heapOid;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinRangeScanDesc *brscan;
+	TupleTableSlot *slot;
+
+	/*
+	 * Determine how many BRIN ranges could there be, allocate space and read
+	 * all the min/max values.
+	 */
+	opaque = (BrinOpaque *) scan->opaque;
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	indexRel = bdesc->bd_index;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	heapOid = IndexGetRelation(RelationGetRelid(indexRel), false);
+	heapRel = table_open(heapOid, AccessShareLock);
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+	table_close(heapRel, AccessShareLock);
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* initialize the scan describing scan of ranges sorted by minval */
+	brscan = brin_minmax_scan_init(bdesc, colloid, attnum, asc);
+
+	slot = MakeSingleTupleTableSlot(brscan->tdesc, &TTSOpsVirtual);
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX The sort may be quite expensive, e.g. for small BRIN ranges. Maybe
+	 * we could optimize this somehow? For example, we know the not-summarized
+	 * ranges are always going to be first, and all-null ranges last, so maybe
+	 * we could stash those somewhere, and not sort them? But there are likely
+	 * only very few such ranges, in most cases. Moreover, how would we then
+	 * prepend/append those ranges to the sorted ones? Probably would have to
+	 * store them in a tuplestore, or something.
+	 *
+	 * XXX Seems that having large work_mem can be quite detrimental, because
+	 * then it overflows L2/L3 caches, making the sort much slower.
+	 *
+	 * XXX If there are other indexes, would be great to filter the ranges, so
+	 * that we only sort the interesting ones - reduces the number of ranges,
+	 * makes the sort faster.
+	 *
+	 * XXX Another option is making this incremental - e.g. only ask for the
+	 * first 1000 ranges, using a top-N sort. And then if it's not enough we
+	 * could request another chunk. But the second request would have to be
+	 * rather unlikely (because quite expensive), and the top-N sort does not
+	 * seem all that faster (as long as we don't overflow L2/L3).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   scan->xs_snapshot);
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/*
+		 * Ranges with no indexed tuple may contain anything.
+		 */
+		if (!gottuple)
+		{
+			brin_minmax_scan_add_tuple(brscan, slot,
+									   heapBlk, heapBlk + (pagesPerRange - 1),
+									   false, false, true, 0, 0);
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/*
+				 * Placeholder tuples are treated as if not summarized.
+				 *
+				 * XXX Is this correct?
+				 */
+				brin_minmax_scan_add_tuple(brscan, slot,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   false, false, true, 0, 0);
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				brin_minmax_scan_add_tuple(brscan, slot,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   bval->bv_hasnulls, bval->bv_allnulls, false,
+										   bval->bv_values[0], bval->bv_values[1]);
+			}
+		}
+	}
+
+	ExecDropSingleTupleTableSlot(slot);
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* do the sort and any necessary post-processing */
+	brin_minmax_scan_finalize(brscan);
+
+#ifdef BRINSORT_DEBUG
+	brin_minmax_scan_dump(brscan);
+#endif
+
+	PG_RETURN_POINTER(brscan);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e57bda7b62d..153e41b856f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -85,6 +85,8 @@ static void show_sort_keys(SortState *sortstate, List *ancestors,
 						   ExplainState *es);
 static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
+static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
+							   ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1100,6 +1102,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexScan:
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
+		case T_BrinSort:
 		case T_TidScan:
 		case T_TidRangeScan:
 		case T_SubqueryScan:
@@ -1262,6 +1265,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_IndexOnlyScan:
 			pname = sname = "Index Only Scan";
 			break;
+		case T_BrinSort:
+			pname = sname = "BRIN Sort";
+			break;
 		case T_BitmapIndexScan:
 			pname = sname = "Bitmap Index Scan";
 			break;
@@ -1508,6 +1514,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainScanTarget((Scan *) indexonlyscan, es);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *brinsort = (BrinSort *) plan;
+
+				ExplainIndexScanDetails(brinsort->indexid,
+										brinsort->indexorderdir,
+										es);
+				ExplainScanTarget((Scan *) brinsort, es);
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *bitmapindexscan = (BitmapIndexScan *) plan;
@@ -1790,6 +1806,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainPropertyFloat("Heap Fetches", NULL,
 									 planstate->instrument->ntuples2, 0, es);
 			break;
+		case T_BrinSort:
+			show_scan_qual(((BrinSort *) plan)->indexqualorig,
+						   "Index Cond", planstate, ancestors, es);
+			if (((BrinSort *) plan)->indexqualorig)
+				show_instrumentation_count("Rows Removed by Index Recheck", 2,
+										   planstate, es);
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			break;
 		case T_BitmapIndexScan:
 			show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
@@ -2389,6 +2417,21 @@ show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 						 ancestors, es);
 }
 
+/*
+ * Show the sort keys for a BRIN Sort node.
+ */
+static void
+show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSort	   *plan = (BrinSort *) sortstate->ss.ps.plan;
+
+	show_sort_group_keys((PlanState *) sortstate, "Sort Key",
+						 plan->numCols, 0, plan->sortColIdx,
+						 plan->sortOperators, plan->collations,
+						 plan->nullsFirst,
+						 ancestors, es);
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
@@ -3809,6 +3852,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
+		case T_BrinSort:
 			/* Assert it's on a real relation */
 			Assert(rte->rtekind == RTE_RELATION);
 			objectname = get_rel_name(rte->relid);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 11118d0ce02..bcaa2ce8e21 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -38,6 +38,7 @@ OBJS = \
 	nodeBitmapHeapscan.o \
 	nodeBitmapIndexscan.o \
 	nodeBitmapOr.o \
+	nodeBrinSort.o \
 	nodeCtescan.o \
 	nodeCustom.o \
 	nodeForeignscan.o \
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d41..93d10078091 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -79,6 +79,7 @@
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
+#include "executor/nodeBrinSort.h"
 #include "executor/nodeCtescan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -226,6 +227,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														 estate, eflags);
 			break;
 
+		case T_BrinSort:
+			result = (PlanState *) ExecInitBrinSort((BrinSort *) node,
+													estate, eflags);
+			break;
+
 		case T_BitmapIndexScan:
 			result = (PlanState *) ExecInitBitmapIndexScan((BitmapIndexScan *) node,
 														   estate, eflags);
@@ -639,6 +645,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIndexOnlyScan((IndexOnlyScanState *) node);
 			break;
 
+		case T_BrinSortState:
+			ExecEndBrinSort((BrinSortState *) node);
+			break;
+
 		case T_BitmapIndexScanState:
 			ExecEndBitmapIndexScan((BitmapIndexScanState *) node);
 			break;
diff --git a/src/backend/executor/meson.build b/src/backend/executor/meson.build
index 65f9457c9b1..ed7f38a1397 100644
--- a/src/backend/executor/meson.build
+++ b/src/backend/executor/meson.build
@@ -26,6 +26,7 @@ backend_sources += files(
   'nodeBitmapHeapscan.c',
   'nodeBitmapIndexscan.c',
   'nodeBitmapOr.c',
+  'nodeBrinSort.c',
   'nodeCtescan.c',
   'nodeCustom.c',
   'nodeForeignscan.c',
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
new file mode 100644
index 00000000000..9505eafc548
--- /dev/null
+++ b/src/backend/executor/nodeBrinSort.c
@@ -0,0 +1,1612 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.c
+ *	  Routines to support sorted scan of relations using a BRIN index
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * The overall algorithm is roughly this:
+ *
+ * 0) initialize a tuplestore and a tuplesort
+ *
+ * 1) fetch list of page ranges from the BRIN index, sorted by minval
+ *    (with the not-summarized ranges first, and all-null ranges last)
+ *
+ * 2) for NULLS FIRST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ * 3) while there are ranges in the list, do this:
+ *
+ *   a) get next (distinct) minval from the list, call it watermark
+ *
+ *   b) if there are any tuples in the tuplestore, move them to tuplesort
+ *
+ *   c) process all ranges with (minval < watermark) - read tuples and feed
+ *      them either into tuplestore (when value < watermark) or tuplestore
+ *
+ *   d) sort the tuplestore, output all the tuples
+ *
+ * 4) if some tuples remain in the tuplestore, sort and output them
+ *
+ * 5) for NULLS LAST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ *
+ * For DESC orderings the process is almost the same, except that we look
+ * at maxval and use '>' operator (but that's transparent).
+ *
+ * There's a couple possible things that might be done in different ways:
+ *
+ * 1) Not using tuplestore, and feeding tuples only to a tuplesort. Then
+ * while producing the tuples, we'd only output tuples up to the current
+ * watermark, and then we'd keep the remaining tuples for the next round.
+ * Either we'd need to transfer them into a second tuplesort, or allow
+ * "reopening" the tuplesort and adding more tuples. And then only the
+ * part since the watermark would get sorted (possibly using a merge-sort
+ * with the already sorted part).
+ *
+ *
+ * 2) The other question is what to do with NULL values - at the moment we
+ * just read the ranges, output the NULL tuples and that's it - we're not
+ * retaining any non-NULL tuples, so that we'll read the ranges again in
+ * the second range. The logic here is that either there are very few
+ * such ranges, so it's won't cost much to just re-read them. Or maybe
+ * there are very many such ranges, and we'd do a lot of spilling to the
+ * tuplestore, and it's not much more expensive to just re-read the source
+ * data. There are counter-examples, though - e.g., there might be many
+ * has_nulls ranges, but with very few non-NULL tuples. In this case it
+ * might be better to actually spill the tuples instead of re-reading all
+ * the ranges. Maybe this is something we can do at run-time, or maybe we
+ * could estimate this at planning time. We do know the null_frac for the
+ * column, so we know the number of NULL rows. And we also know the number
+ * of all_nulls and has_nulls ranges. We can estimate the number of rows
+ * per range, and we can estimate how many non-NULL rows are in the
+ * has_nulls ranges (we don't need to re-read all-nulls ranges). There's
+ * also the filter, which may reduce the amount of rows to store.
+ *
+ * So we'd need to compare two metrics calculated roughly like this:
+ *
+ *   cost(re-reading has-nulls ranges)
+ *      = cost(random_page_cost * n_has_nulls + seq_page_cost * pages_per_range)
+ *
+ *   cost(spilling non-NULL rows from has-nulls ranges)
+ *      = cost(numrows * width / BLCKSZ * seq_page_cost * 2)
+ *
+ * where numrows is the number of non-NULL rows in has_null ranges, which
+ * can be calculated like this:
+ *
+ *   // estimated number of rows in has-null ranges
+ *   rows_in_has_nulls = (reltuples / relpages) * pages_per_range * n_has_nulls
+ *
+ *   // number of NULL rows in the has-nulls ranges
+ *   nulls_in_ranges = reltuples * null_frac - n_all_nulls * (reltuples / relpages)
+ *
+ *   // numrows is the difference, multiplied by selectivity of the index
+ *   // filter condition (value between 0.0 and 1.0)
+ *   numrows = (rows_in_has_nulls - nulls_in_ranges) * selectivity
+ *
+ * This ignores non-summarized ranges, but there should be only very few of
+ * those, so it should not make a huge difference. Otherwise we can divide
+ * them between regular, has-nulls and all-nulls pages to keep the ratio.
+ *
+ *
+ * 3) How large step to make when updating the watermark?
+ *
+ * When updating the watermark, one option is to simply proceed to the next
+ * distinct minval value, which is the smallest possible step we can make.
+ * This may be both fine and very inefficient, depending on how many rows
+ * end up in the tuplesort and how many rows we end up spilling (possibly
+ * repeatedly to the tuplestore).
+ *
+ * When having to sort large number of rows, it's inefficient to run many
+ * tiny sorts, even if it produces correct result. For example when sorting
+ * 1M rows, we may split this as either (a) 100000x sorts of 10 rows, or
+ * (b) 1000 sorts of 1000 rows. The (b) option is almost certainly more
+ * efficient. Maybe sorts of 10k rows would be even better, if it fits
+ * into work_mem.
+ *
+ * This gets back to how large the page ranges are, and if/how much they
+ * overlap. With tiny ranges (e.g. a single-page ranges), a single range
+ * can only add as many rows as we can fit on a single page. So we need
+ * more ranges by default - how many watermark steps that is depends on
+ * how many distinct minval values are there ...
+ *
+ * Then there's overlaps - if ranges do not overlap, we're done and we'll
+ * add the whole range because the next watermark is above maxval. But
+ * when the ranges overlap, we'll only add the first part (assuming the
+ * minval of the next range is the watermark). Assume 10 overlapping
+ * ranges - imagine for example ranges shifted by 10%, so something like
+ *
+ *   [0,100] [10,110], [20,120], [30, 130], ..., [90, 190]
+ *
+ * In the first step we use watermark=10 and load the first range, with
+ * maybe 1000 rows in total. But assuming uniform distribution, only about
+ * 100 rows will go into the tuplesort, the remaining 900 rows will go into
+ * the tuplestore (assuming uniform distribution). Then in the second step
+ * we sort another 100 rows and the remaining 800 rows will be moved into
+ * a new tuplestore. And so on and so on.
+ *
+ * This means that incrementing the watermarks by single steps may be
+ * quite inefficient, and we need to reflect both the range size and
+ * how much the ranges overlap.
+ *
+ * In fact, maybe we should not determine the step as number of minval
+ * values to skip, but how many ranges would that mean reading. Because
+ * if we have a minval with many duplicates, that may load many rows.
+ * Or even better, we could look at how many rows would that mean loading
+ * into the tuplestore - if we track P(x<minval) for each range (e.g. by
+ * calculating average value during ANALYZE, or perhaps by estimating
+ * it from per-column stats), then we know the increment is going to be
+ * about
+ *
+ *     P(x < minval[i]) - P(x < minval[i-1])
+ *
+ * and we can stop once we'd exceed work_mem (with some slack). See comment
+ * for brin_minmax_stats() for more thoughts.
+ *
+ *
+ * 4) LIMIT/OFFSET vs. full sort
+ *
+ * There's one case where very small sorts may be actually optimal, and
+ * that's queries that need to process only very few rows - say, LIMIT
+ * queries with very small bound.
+ *
+ *
+ * FIXME handling of other brin opclasses (minmax-multi)
+ *
+ * FIXME improve costing
+ *
+ *
+ * Improvement ideas:
+ *
+ * 1) multiple tuplestores for overlapping ranges
+ *
+ * When there are many overlapping ranges (so that maxval > current.maxval),
+ * we're loading all the "future" tuples into a new tuplestore. However, if
+ * there are multiple such ranges (imagine ranges "shifting" by 10%, which
+ * gives us 9 more ranges), we know in the next round we'll only need rows
+ * until the next maxval. We'll not sort these rows, but we'll still shuffle
+ * them around until we get to the proper range (so about 10x each row).
+ * Maybe we should pre-allocate the tuplestores (or maybe even tuplesorts)
+ * for future ranges, and route the tuples to the correct one? Maybe we
+ * could be a bit smarter and discard tuples once we have enough rows for
+ * the preceding ranges (say, with LIMIT queries). We'd also need to worry
+ * about work_mem, though - we can't just use many tuplestores, each with
+ * whole work_mem. So we'd probably use e.g. work_mem/2 for the next one,
+ * and then /4, /8 etc. for the following ones. That's work_mem in total.
+ * And there'd need to be some limit on number of tuplestores, I guess.
+ *
+ * 2) handling NULL values
+ *
+ * We need to handle NULLS FIRST / NULLS LAST cases. The question is how
+ * to do that - the easiest way is to simply do a separate scan of ranges
+ * that might contain NULL values, processing just rows with NULLs, and
+ * discarding other rows. And then process non-NULL values as currently.
+ * The NULL scan would happen before/after this regular phase.
+ *
+ * Byt maybe we could be smarter, and not do separate scans. When reading
+ * a page, we might stash the tuple in a tuplestore, so that we can read
+ * it the next round. Obviously, this might be expensive if we need to
+ * keep too many rows, so the tuplestore would grow too large - in that
+ * case it might be better to just do the two scans.
+ *
+ * 3) parallelism
+ *
+ * Presumably we could do a parallel version of this. The leader or first
+ * worker would prepare the range information, and the workers would then
+ * grab ranges (in a kinda round robin manner), sort them independently,
+ * and then the results would be merged by Gather Merge.
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeBrinSort.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *		ExecBrinSort			scans a relation using an index
+ *		IndexNext				retrieve next tuple using index
+ *		ExecInitBrinSort		creates and initializes state info.
+ *		ExecReScanBrinSort		rescans the indexed relation.
+ *		ExecEndBrinSort			releases all storage.
+ *		ExecBrinSortMarkPos		marks scan position.
+ *		ExecBrinSortRestrPos	restores scan position.
+ *		ExecBrinSortEstimate	estimates DSM space needed for parallel index scan
+ *		ExecBrinSortInitializeDSM initialize DSM for parallel BrinSort
+ *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
+ *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
+ */
+#include "postgres.h"
+
+#include "access/brin.h"
+#include "access/brin_internal.h"
+#include "access/nbtree.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
+#include "executor/execdebug.h"
+#include "executor/nodeBrinSort.h"
+#include "lib/pairingheap.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/array.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+
+static TupleTableSlot *IndexNext(BrinSortState *node);
+static bool IndexRecheck(BrinSortState *node, TupleTableSlot *slot);
+static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
+
+#ifdef DEBUG_BRIN_SORT
+bool debug_brin_sort = false;
+#endif
+
+/* do various consistency checks */
+static void
+AssertCheckRanges(BrinSortState *node)
+{
+#ifdef USE_ASSERT_CHECKING
+
+#endif
+}
+
+/*
+ * brinsort_start_tidscan
+ *		Start scanning tuples from a given page range.
+ *
+ * We open a TID range scan for the given range, and initialize the tuplesort.
+ * Optionally, we update the watermark (with either high/low value). We only
+ * need to do this for the main page range, not for the intersecting ranges.
+ *
+ * XXX Maybe we should initialize the tidscan only once, and then do rescan
+ * for the following ranges? And similarly for the tuplesort?
+ */
+static void
+brinsort_start_tidscan(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate = node->ss.ps.state;
+	BrinRange  *range = node->bs_range;
+
+	/* There must not be any TID scan in progress yet. */
+	Assert(node->ss.ss_currentScanDesc == NULL);
+
+	/* Initialize the TID range scan, for the provided block range. */
+	if (node->ss.ss_currentScanDesc == NULL)
+	{
+		TableScanDesc		tscandesc;
+		ItemPointerData		mintid,
+							maxtid;
+
+		ItemPointerSetBlockNumber(&mintid, range->blkno_start);
+		ItemPointerSetOffsetNumber(&mintid, 0);
+
+		ItemPointerSetBlockNumber(&maxtid, range->blkno_end);
+		ItemPointerSetOffsetNumber(&maxtid, MaxHeapTuplesPerPage);
+
+		elog(DEBUG1, "loading range blocks [%u, %u]",
+			 range->blkno_start, range->blkno_end);
+
+		tscandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 &mintid, &maxtid);
+		node->ss.ss_currentScanDesc = tscandesc;
+	}
+
+	if (node->bs_tuplesortstate == NULL)
+	{
+		TupleDesc	tupDesc = (node->ss.ps.ps_ResultTupleDesc);
+
+		node->bs_tuplesortstate = tuplesort_begin_heap(tupDesc,
+													plan->numCols,
+													plan->sortColIdx,
+													plan->sortOperators,
+													plan->collations,
+													plan->nullsFirst,
+													work_mem,
+													NULL,
+													TUPLESORT_NONE);
+	}
+
+	if (node->bs_tuplestore == NULL)
+	{
+		node->bs_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+	}
+}
+
+/*
+ * brinsort_end_tidscan
+ *		Finish the TID range scan.
+ */
+static void
+brinsort_end_tidscan(BrinSortState *node)
+{
+	/* get the first range, read all tuples using a tid range scan */
+	if (node->ss.ss_currentScanDesc != NULL)
+	{
+		table_endscan(node->ss.ss_currentScanDesc);
+		node->ss.ss_currentScanDesc = NULL;
+	}
+}
+
+/*
+ * brinsort_update_watermark
+ *		Advance the watermark to the next minval (or maxval for DESC).
+ *
+ * We could could actually advance the watermark by multiple steps (not to
+ * the immediately following minval, but a couple more), to accumulate more
+ * rows in the tuplesort. The number of steps we make correlates with the
+ * amount of data we sort in a given step, but we don't know in advance
+ * how many rows (or bytes) will that actually be. We could do some simple
+ * heuristics (measure past sorts and extrapolate).
+ *
+ * XXX With a separate _set and _empty flags, we don't really need to pass
+ * a separate "first" parameter - "set=false" has the same meaning.
+ */
+static void
+brinsort_update_watermark(BrinSortState *node, bool asc)
+{
+	int		cmp;
+	bool	found = false;
+
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool	isnull;
+		Datum	value;
+		bool	all_nulls;
+		bool	not_summarized;
+
+		all_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 4, &isnull));
+		Assert(!isnull);
+
+		not_summarized = DatumGetBool(slot_getattr(node->bs_scan->slot, 5, &isnull));
+		Assert(!isnull);
+
+		/* we ignore ranges that are either all_nulls or not summarized */
+		if (all_nulls || not_summarized)
+			continue;
+
+		/* use either minval or maxval, depending on the ASC / DESC */
+		if (asc)
+			value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		else
+			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		if (!node->bs_watermark_set)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			continue;
+		}
+
+		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
+								  &node->bs_sortsupport);
+
+		if (cmp < 0)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			found = true;
+			break;
+		}
+	}
+
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	node->bs_watermark_empty = (!found);
+}
+
+/*
+ * brinsort_load_tuples
+ *		Load tuples from the TID range scan, add them to tuplesort/store.
+ *
+ * When called for the "current" range, we don't need to check the watermark,
+ * we know the tuple goes into the tuplesort. So with check_watermark we
+ * skip the comparator call to save CPU cost.
+ */
+static void
+brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_processing)
+{
+	BrinSort	   *plan = (BrinSort *) node->ss.ps.plan;
+	TableScanDesc	scan;
+	EState		   *estate;
+	ScanDirection	direction;
+	TupleTableSlot *slot;
+	BrinRange	   *range = node->bs_range;
+	ProjectionInfo *projInfo;
+
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	projInfo = node->bs_ProjInfo;
+
+	slot = node->ss.ss_ScanTupleSlot;
+
+	Assert(node->bs_range != NULL);
+
+	/*
+	 * If we're not processign NULLS, and this is all-nulls range, we can
+	 * just skip it - we won't find any non-NULL tuples in it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (!null_processing && range->all_nulls)
+		return;
+
+	/*
+	 * Similarly, if we're processing NULLs and this range does not have
+	 * has_nulls flag, we can skip it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
+		return;
+
+	brinsort_start_tidscan(node);
+
+	scan = node->ss.ss_currentScanDesc;
+
+	/*
+	 * Read tuples, evaluate the filter (so that we don't keep tuples only to
+	 * discard them later), and decide if it goes into the current range
+	 * (tuplesort) or overflow (tuplestore).
+	 */
+	while (table_scan_getnextslot_tidrange(scan, direction, slot))
+	{
+		ExprContext *econtext;
+		ExprState  *qual;
+
+		/*
+		 * Fetch data from node
+		 */
+		qual = node->bs_qual;
+		econtext = node->ss.ps.ps_ExprContext;
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 *
+		 * XXX Done here, because in ExecScan we'll get different slot type
+		 * (minimal tuple vs. buffered tuple). Scan expects slot while reading
+		 * from the table (like here), but we're stashing it into a tuplesort.
+		 *
+		 * XXX Maybe we could eliminate many tuples by leveraging the BRIN
+		 * range, by executing the consistent function. But we don't have
+		 * the qual in appropriate format at the moment, so we'd preprocess
+		 * the keys similarly to bringetbitmap(). In which case we should
+		 * probably evaluate the stuff while building the ranges? Although,
+		 * if the "consistent" function is expensive, it might be cheaper
+		 * to do that incrementally, as we need the ranges. Would be a win
+		 * for LIMIT queries, for example.
+		 *
+		 * XXX However, maybe we could also leverage other bitmap indexes,
+		 * particularly for BRIN indexes because that makes it simpler to
+		 * eliminate the ranges incrementally - we know which ranges to
+		 * load from the index, while for other indexes (e.g. btree) we
+		 * have to read the whole index and build a bitmap in order to have
+		 * a bitmap for any range. Although, if the condition is very
+		 * selective, we may need to read only a small fraction of the
+		 * index, so maybe that's OK.
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			int		cmp = 0;	/* matters for check_watermark=false */
+			Datum	value;
+			bool	isnull;
+			TupleTableSlot *tmpslot;
+
+			if (projInfo)
+				tmpslot = ExecProject(projInfo);
+			else
+				tmpslot = slot;
+
+			value = slot_getattr(tmpslot, plan->sortColIdx[0], &isnull);
+
+			/*
+			 * Handle NULL values - stash them into the tuplestore, and then
+			 * we'll output them in "process" stage.
+			 *
+			 * XXX Can we be a bit smarter for LIMIT queries and stop reading
+			 * rows once we get the number we need to produce? Probably not,
+			 * because the ordering may reference other columns (which we may
+			 * satisfy through IncrementalSort). But all NULL columns are
+			 * considered equal, so we need all the rows to properly compare
+			 * the other keys.
+			 */
+			if (null_processing)
+			{
+				/* Stash it to the tuplestore (when NULL, or ignore
+				 * it (when not-NULL). */
+				if (isnull)
+					tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+
+				/* NULL or not, we're done */
+				continue;
+			}
+
+			/* we're not processing NULL values, so ignore NULLs */
+			if (isnull)
+				continue;
+
+			/*
+			 * Otherwise compare to watermark, and stash it either to the
+			 * tuplesort or tuplestore.
+			 */
+			if (check_watermark && node->bs_watermark_set && !node->bs_watermark_empty)
+				cmp = ApplySortComparator(value, false,
+										  node->bs_watermark, false,
+										  &node->bs_sortsupport);
+
+			if (cmp <= 0)
+				tuplesort_puttupleslot(node->bs_tuplesortstate, tmpslot);
+			else
+			{
+				/*
+				 * XXX We can be a bit smarter for LIMIT queries - once we
+				 * know we have more rows in the tuplesort than we need to
+				 * output, we can stop spilling - those rows are not going
+				 * to be needed. We can discard the tuplesort (no need to
+				 * respill) and stop spilling.
+				 */
+				tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+			}
+		}
+
+		ExecClearTuple(slot);
+	}
+
+	ExecClearTuple(slot);
+
+	brinsort_end_tidscan(node);
+}
+
+/*
+ * brinsort_load_spill_tuples
+ *		Load tuples from the spill tuplestore, and either stash them into
+ *		a tuplesort or a new tuplestore.
+ *
+ * After processing the last range, we want to process all remaining ranges,
+ * so with check_watermark=false we skip the check.
+ */
+static void
+brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	Tuplestorestate *tupstore;
+	TupleTableSlot *slot;
+	ProjectionInfo *projInfo;
+
+	projInfo = node->bs_ProjInfo;
+
+	if (node->bs_tuplestore == NULL)
+		return;
+
+	/* start scanning the existing tuplestore (XXX needed?) */
+	tuplestore_rescan(node->bs_tuplestore);
+
+	/*
+	 * Create a new tuplestore, for tuples that exceed the watermark and so
+	 * should not be included in the current sort.
+	 */
+	tupstore = tuplestore_begin_heap(false, false, work_mem);
+
+	/*
+	 * We need a slot for minimal tuples. The scan slot uses buffered tuples,
+	 * so it'd trigger an error in the loop.
+	 */
+	if (projInfo)
+		slot = node->ss.ps.ps_ResultTupleSlot;
+	else
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(node->ss.ss_currentRelation),
+									&TTSOpsMinimalTuple);
+
+	while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+	{
+		int		cmp = 0;	/* matters for check_watermark=false */
+		bool	isnull;
+		Datum	value;
+
+		value = slot_getattr(slot, plan->sortColIdx[0], &isnull);
+
+		/* We shouldn't have NULL values in the spill, at least not now. */
+		Assert(!isnull);
+
+		if (check_watermark && node->bs_watermark_set && !node->bs_watermark_empty)
+			cmp = ApplySortComparator(value, false,
+									  node->bs_watermark, false,
+									  &node->bs_sortsupport);
+
+		if (cmp <= 0)
+			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+		else
+		{
+			/*
+			 * XXX We can be a bit smarter for LIMIT queries - once we
+			 * know we have more rows in the tuplesort than we need to
+			 * output, we can stop spilling - those rows are not going
+			 * to be needed. We can discard the tuplesort (no need to
+			 * respill) and stop spilling.
+			 */
+			tuplestore_puttupleslot(tupstore, slot);
+		}
+	}
+
+	/*
+	 * Discard the existing tuplestore (that we just processed), use the new
+	 * one instead.
+	 */
+	tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = tupstore;
+
+	if (!projInfo)
+		ExecDropSingleTupleTableSlot(slot);
+}
+
+static bool
+brinsort_next_range(BrinSortState *node, bool asc)
+{
+	/* FIXME free the current bs_range, if any */
+	node->bs_range = NULL;
+
+	/*
+	 * Mark the position, so that we can restore it in case we reach the
+	 * current watermark.
+	 */
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	/*
+	 * Get the next range and return it, unless we can prove it's the last
+	 * range that can possibly match the current conditon (thanks to how we
+	 * order the ranges).
+	 *
+	 * Also skip ranges that can't possibly match (e.g. because we are in
+	 * NULL processing, and the range has no NULLs).
+	 */
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool		isnull;
+		Datum		value;
+
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = DatumGetUInt32(slot_getattr(node->bs_scan->slot, 1, &isnull));
+		range->blkno_end = DatumGetUInt32(slot_getattr(node->bs_scan->slot, 2, &isnull));
+		range->has_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 3, &isnull));
+		range->all_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 4, &isnull));
+		range->not_summarized = DatumGetBool(slot_getattr(node->bs_scan->slot, 5, &isnull));
+		range->min_value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		/*
+		 * Not-summarized ranges match irrespectedly of the watermark (if
+		 * it's set at all).
+		 */
+		if (range->not_summarized)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * The range is summarized, but maybe the watermark is not? That
+		 * would mean we're processing NULL values, so we skip ranges that
+		 * can't possibly match (i.e. with all_nulls=has_nulls=false).
+		 */
+		if (!node->bs_watermark_set)
+		{
+			if (range->all_nulls || range->has_nulls)
+			{
+				node->bs_range = range;
+				return true;
+			}
+
+			/* update the position and try the next range */
+			tuplesort_markpos(node->bs_scan->ranges);
+			pfree(range);
+
+			continue;
+		}
+
+		/*
+		 * Watermark is set, but it's empty - everything matches (except
+		 * for NULL-only ranges, because we're definitely not processing
+		 * NULLS, because then we wouldn't have watermark set).
+		 */
+		if (node->bs_watermark_empty)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * So now we have a summarized range, and we know the watermark
+		 * is set too (so we're not processing NULLs). We place the ranges
+		 * with only nulls last, so once we hit one we're done.
+		 */
+		if (range->all_nulls)
+		{
+			pfree(range);
+			return false;	/* no more matching ranges */
+		}
+
+		/*
+		 * Compare the range to the watermark, using either the minval or
+		 * maxval, depending on ASC/DESC ordering. If the range precedes the
+		 * watermark, return it. Otherwise abort, all the future ranges are
+		 * either not matching the watermark (thanks to ordering) or contain
+		 * only NULL values.
+		 */
+
+		/* use minval or maxval, depending on ASC / DESC */
+		value = (asc) ? range->min_value : range->max_value;
+
+		/*
+		 * compare it to the current watermark (if set)
+		 *
+		 * XXX We don't use (... <= 0) here, because then we'd load ranges
+		 * with that minval (and there might be multiple), but most of the
+		 * rows would go into the tuplestore, because only rows matching the
+		 * minval exactly would be loaded into tuplesort.
+		 */
+		if (ApplySortComparator(value, false,
+								 node->bs_watermark, false,
+								 &node->bs_sortsupport) < 0)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		pfree(range);
+		break;
+	}
+
+	/* not a matching range, we're done */
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	return false;
+}
+
+static bool
+brinsort_range_with_nulls(BrinSortState *node)
+{
+	BrinRange *range = node->bs_range;
+
+	if (range->all_nulls || range->has_nulls || range->not_summarized)
+		return true;
+
+	return false;
+}
+
+static void
+brinsort_rescan(BrinSortState *node)
+{
+	tuplesort_rescan(node->bs_scan->ranges);
+}
+
+/* ----------------------------------------------------------------
+ *		IndexNext
+ *
+ *		Retrieve a tuple from the BrinSort node's currentRelation
+ *		using the index specified in the BrinSortState information.
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+IndexNext(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate;
+	ScanDirection direction;
+	IndexScanDesc scandesc;
+	TupleTableSlot *slot;
+	bool		nullsFirst;
+	bool		asc;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+
+	/* flip direction if this is an overall backward scan */
+	/* XXX For BRIN indexes this is always forward direction */
+	// if (ScanDirectionIsBackward(((BrinSort *) node->ss.ps.plan)->indexorderdir))
+	if (false)
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	scandesc = node->iss_ScanDesc;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	nullsFirst = plan->nullsFirst[0];
+	asc = ScanDirectionIsForward(plan->indexorderdir);
+
+	if (scandesc == NULL)
+	{
+		/*
+		 * We reach here if the index scan is not parallel, or if we're
+		 * serially executing an index scan that was planned to be parallel.
+		 */
+		scandesc = index_beginscan(node->ss.ss_currentRelation,
+								   node->iss_RelationDesc,
+								   estate->es_snapshot,
+								   node->iss_NumScanKeys,
+								   node->iss_NumOrderByKeys);
+
+		node->iss_ScanDesc = scandesc;
+
+		/*
+		 * If no run-time keys to calculate or they are ready, go ahead and
+		 * pass the scankeys to the index AM.
+		 */
+		if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+			index_rescan(scandesc,
+						 node->iss_ScanKeys, node->iss_NumScanKeys,
+						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+
+		/*
+		 * Load info about BRIN ranges, sort them to match the desired ordering.
+		 */
+		ExecInitBrinSortRanges(plan, node);
+		node->bs_phase = BRINSORT_START;
+	}
+
+	/*
+	 * ok, now that we have what we need, fetch the next tuple.
+	 */
+	while (node->bs_phase != BRINSORT_FINISHED)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		elog(DEBUG1, "phase = %d", node->bs_phase);
+
+		AssertCheckRanges(node);
+
+		switch (node->bs_phase)
+		{
+			case BRINSORT_START:
+
+				elog(DEBUG1, "phase = START");
+
+				/*
+				 * If we have NULLS FIRST, move to that stage. Otherwise
+				 * start scanning regular ranges.
+				 */
+				if (nullsFirst)
+					node->bs_phase = BRINSORT_LOAD_NULLS;
+				else
+				{
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+
+					/* set the first watermark */
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_RANGE:
+				{
+					elog(DEBUG1, "phase = LOAD_RANGE");
+
+					/*
+					 * Load tuples matching the new watermark from the existing
+					 * spill tuplestore. We do this before loading tuples from
+					 * the next chunk of ranges, because those will add tuples
+					 * to the spill, and we'd end up processing those twice.
+					 */
+					brinsort_load_spill_tuples(node, true);
+
+					/*
+					 * Load tuples from ranges, until we find a range that has
+					 * min_value >= watermark.
+					 *
+					 * XXX In fact, we are guaranteed to find an exact match
+					 * for the watermark, because of how we pick the watermark.
+					 */
+					while (brinsort_next_range(node, asc))
+						brinsort_load_tuples(node, true, false);
+
+					/*
+					 * If we have loaded any tuples into the tuplesort, try
+					 * sorting it and move to producing the tuples.
+					 *
+					 * XXX The range might have no rows matching the current
+					 * watermark, in which case the tuplesort is empty.
+					 */
+					if (node->bs_tuplesortstate)
+					{
+#ifdef DEBUG_BRIN_SORT
+						tuplesort_reset_stats(node->bs_tuplesortstate);
+#endif
+
+						tuplesort_performsort(node->bs_tuplesortstate);
+
+#ifdef DEBUG_BRIN_SORT
+						if (debug_brin_sort)
+						{
+							TuplesortInstrumentation stats;
+
+							memset(&stats, 0, sizeof(TuplesortInstrumentation));
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							elog(WARNING, "method: %s  space: %ld kB (%s)",
+								 tuplesort_method_name(stats.sortMethod),
+								 stats.spaceUsed,
+								 tuplesort_space_type_name(stats.spaceType));
+						}
+#endif
+					}
+
+					node->bs_phase = BRINSORT_PROCESS_RANGE;
+					break;
+				}
+
+			case BRINSORT_PROCESS_RANGE:
+
+				elog(DEBUG1, "phase BRINSORT_PROCESS_RANGE");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplesortstate != NULL)
+				{
+					if (tuplesort_gettupleslot(node->bs_tuplesortstate,
+										ScanDirectionIsForward(direction),
+										false, slot, NULL))
+						return slot;
+
+					/* once we're done with the tuplesort, reset it */
+					tuplesort_reset(node->bs_tuplesortstate);
+				}
+
+				/*
+				 * Now that we processed tuples from the last range batch,
+				 * see if we reached the end of if we should try updating
+				 * the watermark once again. If the watermark is not set,
+				 * we've already processed the last range.
+				 */
+				if (node->bs_watermark_empty)
+				{
+					if (nullsFirst)
+						node->bs_phase = BRINSORT_FINISHED;
+					else
+					{
+						brinsort_rescan(node);
+						node->bs_phase = BRINSORT_LOAD_NULLS;
+						node->bs_watermark_set = false;
+						node->bs_watermark_empty = false;
+					}
+				}
+				else
+				{
+					/* updte the watermark and try reading more ranges */
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_NULLS:
+				{
+					elog(DEBUG1, "phase = LOAD_NULLS");
+
+					/*
+					 * Try loading another range. If there are no more ranges,
+					 * we're done and we move either to loading regular ranges.
+					 * Otherwise check if this range can contain NULL values.
+					 * If yes, process the range. If not, try loading another
+					 * one from the list.
+					 */
+					while (true)
+					{
+						/* no more ranges - terminate or load regular ranges */
+						if (!brinsort_next_range(node, asc))
+						{
+							if (nullsFirst)
+							{
+								brinsort_rescan(node);
+								node->bs_phase = BRINSORT_LOAD_RANGE;
+								brinsort_update_watermark(node, asc);
+							}
+							else
+								node->bs_phase = BRINSORT_FINISHED;
+
+							break;
+						}
+
+						/* If this range (may) have nulls, proces them */
+						if (brinsort_range_with_nulls(node))
+							break;
+					}
+
+					if (node->bs_range == NULL)
+						break;
+
+					/*
+					 * There should be nothing left in the tuplestore, because
+					 * we flush that at the end of processing regular tuples,
+					 * and we don't retain tuples between NULL ranges.
+					 */
+					// Assert(node->bs_tuplestore == NULL);
+
+					/*
+					 * Load the next unprocessed / NULL range. We don't need to
+					 * check watermark while processing NULLS.
+					 */
+					brinsort_load_tuples(node, false, true);
+
+					node->bs_phase = BRINSORT_PROCESS_NULLS;
+					break;
+				}
+
+				break;
+
+			case BRINSORT_PROCESS_NULLS:
+
+				elog(DEBUG1, "phase = LOAD_NULLS");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				Assert(node->bs_tuplestore != NULL);
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplestore != NULL)
+				{
+
+					while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+						return slot;
+
+					tuplestore_end(node->bs_tuplestore);
+					node->bs_tuplestore = NULL;
+
+					node->bs_phase = BRINSORT_LOAD_NULLS;	/* load next range */
+				}
+
+				break;
+
+			case BRINSORT_FINISHED:
+				elog(ERROR, "unexpected BrinSort phase: FINISHED");
+				break;
+		}
+	}
+
+	/*
+	 * if we get here it means the index scan failed so we are at the end of
+	 * the scan..
+	 */
+	node->iss_ReachedEnd = true;
+	return ExecClearTuple(slot);
+}
+
+/*
+ * IndexRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+IndexRecheck(BrinSortState *node, TupleTableSlot *slot)
+{
+	ExprContext *econtext;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	econtext = node->ss.ps.ps_ExprContext;
+
+	/* Does the tuple meet the indexqual condition? */
+	econtext->ecxt_scantuple = slot;
+	return ExecQualAndReset(node->indexqualorig, econtext);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSort(node)
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecBrinSort(PlanState *pstate)
+{
+	BrinSortState *node = castNode(BrinSortState, pstate);
+
+	/*
+	 * If we have runtime keys and they've not already been set up, do it now.
+	 */
+	if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady)
+		ExecReScan((PlanState *) node);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) IndexNext,
+					(ExecScanRecheckMtd) IndexRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanBrinSort(node)
+ *
+ *		Recalculates the values of any scan keys whose value depends on
+ *		information known at runtime, then rescans the indexed relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanBrinSort(BrinSortState *node)
+{
+	/*
+	 * If we are doing runtime key calculations (ie, any of the index key
+	 * values weren't simple Consts), compute the new key values.  But first,
+	 * reset the context so we don't leak memory as each outer tuple is
+	 * scanned.  Note this assumes that we will recalculate *all* runtime keys
+	 * on each call.
+	 */
+	if (node->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *econtext = node->iss_RuntimeContext;
+
+		ResetExprContext(econtext);
+		ExecIndexEvalRuntimeKeys(econtext,
+								 node->iss_RuntimeKeys,
+								 node->iss_NumRuntimeKeys);
+	}
+	node->iss_RuntimeKeysReady = true;
+
+	/* reset index scan */
+	if (node->iss_ScanDesc)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+	node->iss_ReachedEnd = false;
+
+	ExecScanReScan(&node->ss);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecEndBrinSort
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndBrinSort(BrinSortState *node)
+{
+	Relation	indexRelationDesc;
+	IndexScanDesc IndexScanDesc;
+
+	/*
+	 * extract information from the node
+	 */
+	indexRelationDesc = node->iss_RelationDesc;
+	IndexScanDesc = node->iss_ScanDesc;
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/*
+	 * close the index relation (no-op if we didn't open it)
+	 */
+	if (IndexScanDesc)
+		index_endscan(IndexScanDesc);
+	if (indexRelationDesc)
+		index_close(indexRelationDesc, NoLock);
+
+	if (node->ss.ss_currentScanDesc != NULL)
+		table_endscan(node->ss.ss_currentScanDesc);
+
+	if (node->bs_tuplestore != NULL)
+		tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = NULL;
+
+	if (node->bs_tuplesortstate != NULL)
+		tuplesort_end(node->bs_tuplesortstate);
+	node->bs_tuplesortstate = NULL;
+
+	if (node->bs_scan->ranges != NULL)
+		tuplesort_end(node->bs_scan->ranges);
+	node->bs_scan->ranges = NULL;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortMarkPos
+ *
+ * Note: we assume that no caller attempts to set a mark before having read
+ * at least one tuple.  Otherwise, iss_ScanDesc might still be NULL.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortMarkPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  If a test tuple exists for
+		 * this relation, then we shouldn't access the index at all.  We would
+		 * instead need to save, and later restore, the state of the
+		 * relsubs_done flag, so that re-fetching the test tuple is possible.
+		 * However, given the assumption that no caller sets a mark at the
+		 * start of the scan, we can only get here with relsubs_done[i]
+		 * already set, and so no state need be saved.
+		 */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortMarkPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_markpos(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecIndexRestrPos
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortRestrPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (estate->es_epq_active != NULL)
+	{
+		/* See comments in ExecIndexMarkPos */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortRestrPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_restrpos(node->iss_ScanDesc);
+}
+
+/*
+ * somewhat crippled verson of bringetbitmap
+ *
+ * XXX We don't call consistent function (or any other function), so unlike
+ * bringetbitmap we don't set a separate memory context. If we end up filtering
+ * the ranges somehow (e.g. by WHERE conditions), this might be necessary.
+ *
+ * XXX Should be part of opclass, to somewhere in brin_minmax.c etc.
+ */
+static void
+ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
+{
+	IndexScanDesc	scan = planstate->iss_ScanDesc;
+	Relation	indexRel = planstate->iss_RelationDesc;
+	int			attno;
+	FmgrInfo   *rangeproc;
+	BrinRangeScanDesc *brscan;
+	bool		asc;
+
+	/* BRIN Sort only allows ORDER BY using a single column */
+	Assert(node->numCols == 1);
+
+	attno = node->attnums[0];
+
+	/*
+	 * Make sure we matched the sort key - if not, we should not have got
+	 * to this place at all (try sorting using this index).
+	 */
+	Assert(AttrNumberIsForUserDefinedAttr(attno));
+
+	/*
+	 * get procedure to generate sort ranges
+	 *
+	 * FIXME we can't rely on a particular procnum to identify which opclass
+	 * allows building sort ranges, because the optinal procnums are not
+	 * unique (e.g. inclusion_ops have 12 too). So we probably need a flag
+	 * for the opclass.
+	 */
+	rangeproc = index_getprocinfo(indexRel, attno, BRIN_PROCNUM_RANGES);
+
+	/*
+	 * Should not get here without a proc, thanks to the check before
+	 * building the BrinSort path.
+	 */
+	Assert(OidIsValid(rangeproc->fn_oid));
+
+	memset(&planstate->bs_sortsupport, 0, sizeof(SortSupportData));
+
+	planstate->bs_sortsupport.ssup_collation = node->collations[0];
+	planstate->bs_sortsupport.ssup_cxt = CurrentMemoryContext; // FIXME
+
+	PrepareSortSupportFromOrderingOp(node->sortOperators[0], &planstate->bs_sortsupport);
+
+	/*
+	 * Determine if this ASC or DESC sort, so that we can request the
+	 * ranges in the appropriate order (ordered either by minval for
+	 * ASC, or by maxval for DESC).
+	 */
+	asc = ScanDirectionIsForward(node->indexorderdir);
+
+	/*
+	 * Ask the opclass to produce ranges in appropriate ordering.
+	 *
+	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
+	 */
+	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
+											node->collations[0],
+											PointerGetDatum(scan),
+											Int16GetDatum(attno),
+											BoolGetDatum(asc)));
+
+	/* allocate for space, and also for the alternative ordering */
+	planstate->bs_scan = brscan;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitBrinSort
+ *
+ *		Initializes the index scan's state information, creates
+ *		scan keys, and opens the base and index relations.
+ *
+ *		Note: index scans have 2 sets of state information because
+ *			  we have to keep track of the base relation and the
+ *			  index relation.
+ * ----------------------------------------------------------------
+ */
+BrinSortState *
+ExecInitBrinSort(BrinSort *node, EState *estate, int eflags)
+{
+	BrinSortState *indexstate;
+	Relation	currentRelation;
+	LOCKMODE	lockmode;
+
+	/*
+	 * create state structure
+	 */
+	indexstate = makeNode(BrinSortState);
+	indexstate->ss.ps.plan = (Plan *) node;
+	indexstate->ss.ps.state = estate;
+	indexstate->ss.ps.ExecProcNode = ExecBrinSort;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &indexstate->ss.ps);
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	indexstate->ss.ss_currentRelation = currentRelation;
+	indexstate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &indexstate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+	// ExecInitResultTypeTL(&indexstate->ss.ps);
+	// ExecAssignScanProjectionInfo(&indexstate->ss);
+	// ExecInitResultSlot(&indexstate->ss.ps, &TTSOpsVirtual);
+
+	indexstate->bs_ProjInfo = ExecBuildProjectionInfo(((Plan *) node)->targetlist,
+													  indexstate->ss.ps.ps_ExprContext,
+													  indexstate->ss.ps.ps_ResultTupleSlot,
+													  &indexstate->ss.ps,
+													  indexstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor);
+
+	/*
+	 * initialize child expressions
+	 *
+	 * Note: we don't initialize all of the indexqual expression, only the
+	 * sub-parts corresponding to runtime keys (see below).  Likewise for
+	 * indexorderby, if any.  But the indexqualorig expression is always
+	 * initialized even though it will only be used in some uncommon cases ---
+	 * would be nice to improve that.  (Problem is that any SubPlans present
+	 * in the expression must be found now...)
+	 */
+	indexstate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) indexstate);
+	indexstate->indexqualorig =
+		ExecInitQual(node->indexqualorig, (PlanState *) indexstate);
+
+	/*
+	 * If we are just doing EXPLAIN (ie, aren't going to run the plan), stop
+	 * here.  This allows an index-advisor plugin to EXPLAIN a plan containing
+	 * references to nonexistent indexes.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return indexstate;
+
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+
+	/*
+	 * Initialize index-specific scan state
+	 */
+	indexstate->iss_RuntimeKeysReady = false;
+	indexstate->iss_RuntimeKeys = NULL;
+	indexstate->iss_NumRuntimeKeys = 0;
+
+	/*
+	 * build the index scan keys from the index qualification
+	 */
+	ExecIndexBuildScanKeys((PlanState *) indexstate,
+						   indexstate->iss_RelationDesc,
+						   node->indexqual,
+						   false,
+						   &indexstate->iss_ScanKeys,
+						   &indexstate->iss_NumScanKeys,
+						   &indexstate->iss_RuntimeKeys,
+						   &indexstate->iss_NumRuntimeKeys,
+						   NULL,	/* no ArrayKeys */
+						   NULL);
+
+	/*
+	 * If we have runtime keys, we need an ExprContext to evaluate them. The
+	 * node's standard context won't do because we want to reset that context
+	 * for every tuple.  So, build another context just like the other one...
+	 * -tgl 7/11/00
+	 */
+	if (indexstate->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *stdecontext = indexstate->ss.ps.ps_ExprContext;
+
+		ExecAssignExprContext(estate, &indexstate->ss.ps);
+		indexstate->iss_RuntimeContext = indexstate->ss.ps.ps_ExprContext;
+		indexstate->ss.ps.ps_ExprContext = stdecontext;
+	}
+	else
+	{
+		indexstate->iss_RuntimeContext = NULL;
+	}
+
+	indexstate->bs_tuplesortstate = NULL;
+	indexstate->bs_qual = indexstate->ss.ps.qual;
+	indexstate->ss.ps.qual = NULL;
+	// ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+
+	/*
+	 * all done.
+	 */
+	return indexstate;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortEstimate(BrinSortState *node,
+					  ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+
+	node->iss_PscanLen = index_parallelscan_estimate(node->iss_RelationDesc,
+													 estate->es_snapshot);
+	shm_toc_estimate_chunk(&pcxt->estimator, node->iss_PscanLen);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeDSM
+ *
+ *		Set up a parallel index scan descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeDSM(BrinSortState *node,
+						   ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_allocate(pcxt->toc, node->iss_PscanLen);
+	index_parallelscan_initialize(node->ss.ss_currentRelation,
+								  node->iss_RelationDesc,
+								  estate->es_snapshot,
+								  piscan);
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, piscan);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortReInitializeDSM(BrinSortState *node,
+							 ParallelContext *pcxt)
+{
+	index_parallelrescan(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeWorker(BrinSortState *node,
+							  ParallelWorkerContext *pwcxt)
+{
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7918bb6f0db..86f91e6577c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -791,6 +791,260 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	path->path.total_cost = startup_cost + run_cost;
 }
 
+void
+cost_brinsort(BrinSortPath *path, PlannerInfo *root, double loop_count,
+		   bool partial_path)
+{
+	IndexOptInfo *index = path->ipath.indexinfo;
+	RelOptInfo *baserel = index->rel;
+	amcostestimate_function amcostestimate;
+	List	   *qpquals;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_run_cost = 0;
+	Cost		indexStartupCost;
+	Cost		indexTotalCost;
+	Selectivity indexSelectivity;
+	double		indexCorrelation,
+				csquared;
+	double		spc_seq_page_cost,
+				spc_random_page_cost;
+	Cost		min_IO_cost,
+				max_IO_cost;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	double		tuples_fetched;
+	double		pages_fetched;
+	double		rand_heap_pages;
+	double		index_pages;
+
+	/* Should only be applied to base relations */
+	Assert(IsA(baserel, RelOptInfo) &&
+		   IsA(index, IndexOptInfo));
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/*
+	 * Mark the path with the correct row estimate, and identify which quals
+	 * will need to be enforced as qpquals.  We need not check any quals that
+	 * are implied by the index's predicate, so we can use indrestrictinfo not
+	 * baserestrictinfo as the list of relevant restriction clauses for the
+	 * rel.
+	 */
+	if (path->ipath.path.param_info)
+	{
+		path->ipath.path.rows = path->ipath.path.param_info->ppi_rows;
+		/* qpquals come from the rel's restriction clauses and ppi_clauses */
+		qpquals = list_concat(extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+														  path->ipath.indexclauses),
+							  extract_nonindex_conditions(path->ipath.path.param_info->ppi_clauses,
+														  path->ipath.indexclauses));
+	}
+	else
+	{
+		path->ipath.path.rows = baserel->rows;
+		/* qpquals come from just the rel's restriction clauses */
+		qpquals = extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+											  path->ipath.indexclauses);
+	}
+
+	if (!enable_indexscan)
+		startup_cost += disable_cost;
+	/* we don't need to check enable_indexonlyscan; indxpath.c does that */
+
+	/*
+	 * Call index-access-method-specific code to estimate the processing cost
+	 * for scanning the index, as well as the selectivity of the index (ie,
+	 * the fraction of main-table tuples we will have to retrieve) and its
+	 * correlation to the main-table tuple order.  We need a cast here because
+	 * pathnodes.h uses a weak function type to avoid including amapi.h.
+	 */
+	amcostestimate = (amcostestimate_function) index->amcostestimate;
+	amcostestimate(root, &path->ipath, loop_count,
+				   &indexStartupCost, &indexTotalCost,
+				   &indexSelectivity, &indexCorrelation,
+				   &index_pages);
+
+	/*
+	 * Save amcostestimate's results for possible use in bitmap scan planning.
+	 * We don't bother to save indexStartupCost or indexCorrelation, because a
+	 * bitmap scan doesn't care about either.
+	 */
+	path->ipath.indextotalcost = indexTotalCost;
+	path->ipath.indexselectivity = indexSelectivity;
+
+	/* all costs for touching index itself included here */
+	startup_cost += indexStartupCost;
+	run_cost += indexTotalCost - indexStartupCost;
+
+	/* estimate number of main-table tuples fetched */
+	tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);
+
+	/* fetch estimated page costs for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/*----------
+	 * Estimate number of main-table pages fetched, and compute I/O cost.
+	 *
+	 * When the index ordering is uncorrelated with the table ordering,
+	 * we use an approximation proposed by Mackert and Lohman (see
+	 * index_pages_fetched() for details) to compute the number of pages
+	 * fetched, and then charge spc_random_page_cost per page fetched.
+	 *
+	 * When the index ordering is exactly correlated with the table ordering
+	 * (just after a CLUSTER, for example), the number of pages fetched should
+	 * be exactly selectivity * table_size.  What's more, all but the first
+	 * will be sequential fetches, not the random fetches that occur in the
+	 * uncorrelated case.  So if the number of pages is more than 1, we
+	 * ought to charge
+	 *		spc_random_page_cost + (pages_fetched - 1) * spc_seq_page_cost
+	 * For partially-correlated indexes, we ought to charge somewhere between
+	 * these two estimates.  We currently interpolate linearly between the
+	 * estimates based on the correlation squared (XXX is that appropriate?).
+	 *
+	 * If it's an index-only scan, then we will not need to fetch any heap
+	 * pages for which the visibility map shows all tuples are visible.
+	 * Hence, reduce the estimated number of heap fetches accordingly.
+	 * We use the measured fraction of the entire heap that is all-visible,
+	 * which might not be particularly relevant to the subset of the heap
+	 * that this query will fetch; but it's not clear how to do better.
+	 *----------
+	 */
+	if (loop_count > 1)
+	{
+		/*
+		 * For repeated indexscans, the appropriate estimate for the
+		 * uncorrelated case is to scale up the number of tuples fetched in
+		 * the Mackert and Lohman formula by the number of scans, so that we
+		 * estimate the number of pages fetched by all the scans; then
+		 * pro-rate the costs for one scan.  In this case we assume all the
+		 * fetches are random accesses.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+
+		/*
+		 * In the perfectly correlated case, the number of pages touched by
+		 * each scan is selectivity * table_size, and we can use the Mackert
+		 * and Lohman formula at the page level to estimate how much work is
+		 * saved by caching across scans.  We still assume all the fetches are
+		 * random, though, which is an overestimate that's hard to correct for
+		 * without double-counting the cache effects.  (But in most cases
+		 * where such a plan is actually interesting, only one page would get
+		 * fetched per scan anyway, so it shouldn't matter much.)
+		 */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		pages_fetched = index_pages_fetched(pages_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		min_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+	}
+	else
+	{
+		/*
+		 * Normal case: apply the Mackert and Lohman formula, and then
+		 * interpolate between that and the correlation-derived result.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		/* max_IO_cost is for the perfectly uncorrelated case (csquared=0) */
+		max_IO_cost = pages_fetched * spc_random_page_cost;
+
+		/* min_IO_cost is for the perfectly correlated case (csquared=1) */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		if (pages_fetched > 0)
+		{
+			min_IO_cost = spc_random_page_cost;
+			if (pages_fetched > 1)
+				min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
+		}
+		else
+			min_IO_cost = 0;
+	}
+
+	if (partial_path)
+	{
+		/*
+		 * Estimate the number of parallel workers required to scan index. Use
+		 * the number of heap pages computed considering heap fetches won't be
+		 * sequential as for parallel scans the pages are accessed in random
+		 * order.
+		 */
+		path->ipath.path.parallel_workers = compute_parallel_worker(baserel,
+															  rand_heap_pages,
+															  index_pages,
+															  max_parallel_workers_per_gather);
+
+		/*
+		 * Fall out if workers can't be assigned for parallel scan, because in
+		 * such a case this path will be rejected.  So there is no benefit in
+		 * doing extra computation.
+		 */
+		if (path->ipath.path.parallel_workers <= 0)
+			return;
+
+		path->ipath.path.parallel_aware = true;
+	}
+
+	/*
+	 * Now interpolate based on estimated index order correlation to get total
+	 * disk I/O cost for main table accesses.
+	 */
+	csquared = indexCorrelation * indexCorrelation;
+
+	run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
+
+	/*
+	 * Estimate CPU costs per tuple.
+	 *
+	 * What we want here is cpu_tuple_cost plus the evaluation costs of any
+	 * qual clauses that we have to evaluate as qpquals.
+	 */
+	cost_qual_eval(&qpqual_cost, qpquals, root);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+
+	cpu_run_cost += cpu_per_tuple * tuples_fetched;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->ipath.path.pathtarget->cost.startup;
+	cpu_run_cost += path->ipath.path.pathtarget->cost.per_tuple * path->ipath.path.rows;
+
+	/* Adjust costing for parallelism, if used. */
+	if (path->ipath.path.parallel_workers > 0)
+	{
+		double		parallel_divisor = get_parallel_divisor(&path->ipath.path);
+
+		path->ipath.path.rows = clamp_row_est(path->ipath.path.rows / parallel_divisor);
+
+		/* The CPU cost is divided among all the workers. */
+		cpu_run_cost /= parallel_divisor;
+	}
+
+	run_cost += cpu_run_cost;
+
+	path->ipath.path.startup_cost = startup_cost;
+	path->ipath.path.total_cost = startup_cost + run_cost;
+}
+
 /*
  * extract_nonindex_conditions
  *
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 721a0752018..132718fb736 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -17,12 +17,16 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
+#include "access/relation.h"
 #include "access/stratnum.h"
 #include "access/sysattr.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
@@ -32,10 +36,13 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/rel.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
 
+bool		enable_brinsort = true;
+
 /* XXX see PartCollMatchesExprColl */
 #define IndexCollMatchesExprColl(idxcollation, exprcollation) \
 	((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
@@ -1103,6 +1110,182 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	/*
+	 * If this is a BRIN index with suitable opclass (minmax or such), we may
+	 * try doing BRIN sort. BRIN indexes are not ordered and amcanorderbyop
+	 * is set to false, so we probably will need some new opclass flag to
+	 * mark indexes that support this.
+	 */
+	if (enable_brinsort && pathkeys_possibly_useful)
+	{
+		ListCell *lc;
+		Relation rel2 = relation_open(index->indexoid, NoLock);
+		int		 idx;
+
+		/*
+		 * Try generating sorted paths for each key with the right opclass.
+		 */
+		idx = -1;
+		foreach(lc, index->indextlist)
+		{
+			TargetEntry	   *indextle = (TargetEntry *) lfirst(lc);
+			BrinSortPath   *bpath;
+			Oid				rangeproc;
+			AttrNumber		attnum;
+
+			idx++;
+			attnum = (idx + 1);
+
+
+			/* XXX ignore non-BRIN indexes */
+			if (rel2->rd_rel->relam != BRIN_AM_OID)
+				continue;
+
+			/*
+			 * XXX Ignore keys not using an opclass with the "ranges" proc.
+			 * For now we only do this for some minmax opclasses, but adding
+			 * it to all minmax is simple, and adding it to minmax-multi
+			 * should not be very hard.
+			 */
+			rangeproc = index_getprocid(rel2, attnum, BRIN_PROCNUM_RANGES);
+			if (!OidIsValid(rangeproc))
+				continue;
+
+			/*
+			 * XXX stuff extracted from build_index_pathkeys, except that we
+			 * only deal with a single index key (producing a single pathkey),
+			 * so we only sort on a single column. I guess we could use more
+			 * index keys and sort on more expressions? Would that mean these
+			 * keys need to be rather well correlated? In any case, it seems
+			 * rather complex to implement, so I leave it as a possible
+			 * future improvement.
+			 *
+			 * XXX This could also use the other BRIN keys (even from other
+			 * indexes) in a different way - we might use the other ranges
+			 * to quickly eliminate some of the chunks, essentially like a
+			 * bitmap, but maybe without using the bitmap. Or we might use
+			 * other indexes through bitmaps.
+			 *
+			 * XXX This fakes a number of parameters, because we don't store
+			 * the btree opclass in the index, instead we use the default
+			 * one for the key data type. And BRIN does not allow specifying
+			 *
+			 * XXX We don't add the path to result, because this function is
+			 * supposed to generate IndexPaths. Instead, we just add the path
+			 * using add_path(). We should be building this in a different
+			 * place, perhaps in create_index_paths() or so.
+			 *
+			 * XXX By building it elsewhere, we could also leverage the index
+			 * paths we've built here, particularly the bitmap index paths,
+			 * which we could use to eliminate many of the ranges.
+			 *
+			 * XXX We don't have any explicit ordering associated with the
+			 * BRIN index, e.g. we don't have ASC/DESC and NULLS FIRST/LAST.
+			 * So this is not encoded in the index, and we can satisfy all
+			 * these cases - but we need to add paths for each combination.
+			 * I wonder if there's a better way to do this.
+			 */
+
+			/* ASC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* ASC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+		}
+
+		relation_close(rel2, NoLock);
+	}
+
 	return result;
 }
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index c4e7f97f687..10ea23a501b 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -27,6 +27,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/lsyscache.h"
+#include "utils/typcache.h"
 
 
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
@@ -622,6 +623,54 @@ build_index_pathkeys(PlannerInfo *root,
 	return retval;
 }
 
+
+List *
+build_index_pathkeys_brin(PlannerInfo *root,
+						  IndexOptInfo *index,
+						  TargetEntry  *tle,
+						  int idx,
+						  bool reverse_sort,
+						  bool nulls_first)
+{
+	TypeCacheEntry *typcache;
+	PathKey		   *cpathkey;
+	Oid				sortopfamily;
+
+	/*
+	 * Get default btree opfamily for the type, extracted from the
+	 * entry in index targetlist.
+	 *
+	 * XXX Is there a better / more correct way to do this?
+	 */
+	typcache = lookup_type_cache(exprType((Node *) tle->expr),
+								 TYPECACHE_BTREE_OPFAMILY);
+	sortopfamily = typcache->btree_opf;
+
+	/*
+	 * OK, try to make a canonical pathkey for this sort key.  Note we're
+	 * underneath any outer joins, so nullable_relids should be NULL.
+	 */
+	cpathkey = make_pathkey_from_sortinfo(root,
+										  tle->expr,
+										  sortopfamily,
+										  index->opcintype[idx],
+										  index->indexcollations[idx],
+										  reverse_sort,
+										  nulls_first,
+										  0,
+										  index->rel->relids,
+										  false);
+
+	/*
+	 * There may be no pathkey if we haven't matched any sortkey, in which
+	 * case ignore it.
+	 */
+	if (!cpathkey)
+		return NIL;
+
+	return list_make1(cpathkey);
+}
+
 /*
  * partkey_is_bool_constant_for_query
  *
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 134130476e4..2a7be2c4891 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -18,6 +18,7 @@
 
 #include <math.h>
 
+#include "access/genam.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
 #include "foreign/fdwapi.h"
@@ -41,6 +42,7 @@
 #include "parser/parsetree.h"
 #include "partitioning/partprune.h"
 #include "utils/lsyscache.h"
+#include "utils/rel.h"
 
 
 /*
@@ -124,6 +126,8 @@ static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
 										  List *tlist, List *scan_clauses);
 static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
 								   List *tlist, List *scan_clauses, bool indexonly);
+static BrinSort *create_brinsort_plan(PlannerInfo *root, BrinSortPath *best_path,
+									  List *tlist, List *scan_clauses);
 static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
 											   BitmapHeapPath *best_path,
 											   List *tlist, List *scan_clauses);
@@ -191,6 +195,9 @@ static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 List *indexorderby,
 										 List *indextlist,
 										 ScanDirection indexscandir);
+static BrinSort *make_brinsort(List *qptlist, List *qpqual, Index scanrelid,
+							   Oid indexid, List *indexqual, List *indexqualorig,
+							   ScanDirection indexscandir);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -410,6 +417,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_CustomScan:
 			plan = create_scan_plan(root, best_path, flags);
 			break;
+		case T_BrinSort:
+			plan = create_scan_plan(root, best_path, flags);
+			break;
 		case T_HashJoin:
 		case T_MergeJoin:
 		case T_NestLoop:
@@ -776,6 +786,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												   scan_clauses);
 			break;
 
+		case T_BrinSort:
+			plan = (Plan *) create_brinsort_plan(root,
+												 (BrinSortPath *) best_path,
+												 tlist,
+												 scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -3185,6 +3202,223 @@ create_indexscan_plan(PlannerInfo *root,
 	return scan_plan;
 }
 
+/*
+ * create_brinsort_plan
+ *	  Returns a brinsort plan for the base relation scanned by 'best_path'
+ *	  with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ *
+ * This is mostly a slighly simplified version of create_indexscan_plan, with
+ * the unecessary parts removed (we don't support indexonly scans, or reordering
+ * and similar stuff).
+ */
+static BrinSort *
+create_brinsort_plan(PlannerInfo *root,
+					 BrinSortPath *best_path,
+					 List *tlist,
+					 List *scan_clauses)
+{
+	BrinSort   *brinsort_plan;
+	List	   *indexclauses = best_path->ipath.indexclauses;
+	Index		baserelid = best_path->ipath.path.parent->relid;
+	IndexOptInfo *indexinfo = best_path->ipath.indexinfo;
+	Oid			indexoid = indexinfo->indexoid;
+	Relation	indexRel;
+	List	   *indexprs;
+	List	   *qpqual;
+	List	   *stripped_indexquals;
+	List	   *fixed_indexquals;
+	ListCell   *l;
+
+	List	   *pathkeys = best_path->ipath.path.pathkeys;
+
+	/* it should be a base rel... */
+	Assert(baserelid > 0);
+	Assert(best_path->ipath.path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * Extract the index qual expressions (stripped of RestrictInfos) from the
+	 * IndexClauses list, and prepare a copy with index Vars substituted for
+	 * table Vars.  (This step also does replace_nestloop_params on the
+	 * fixed_indexquals.)
+	 */
+	fix_indexqual_references(root, &best_path->ipath,
+							 &stripped_indexquals,
+							 &fixed_indexquals);
+
+	/*
+	 * The qpqual list must contain all restrictions not automatically handled
+	 * by the index, other than pseudoconstant clauses which will be handled
+	 * by a separate gating plan node.  All the predicates in the indexquals
+	 * will be checked (either by the index itself, or by nodeIndexscan.c),
+	 * but if there are any "special" operators involved then they must be
+	 * included in qpqual.  The upshot is that qpqual must contain
+	 * scan_clauses minus whatever appears in indexquals.
+	 *
+	 * is_redundant_with_indexclauses() detects cases where a scan clause is
+	 * present in the indexclauses list or is generated from the same
+	 * EquivalenceClass as some indexclause, and is therefore redundant with
+	 * it, though not equal.  (The latter happens when indxpath.c prefers a
+	 * different derived equality than what generate_join_implied_equalities
+	 * picked for a parameterized scan's ppi_clauses.)  Note that it will not
+	 * match to lossy index clauses, which is critical because we have to
+	 * include the original clause in qpqual in that case.
+	 *
+	 * In some situations (particularly with OR'd index conditions) we may
+	 * have scan_clauses that are not equal to, but are logically implied by,
+	 * the index quals; so we also try a predicate_implied_by() check to see
+	 * if we can discard quals that way.  (predicate_implied_by assumes its
+	 * first input contains only immutable functions, so we have to check
+	 * that.)
+	 *
+	 * Note: if you change this bit of code you should also look at
+	 * extract_nonindex_conditions() in costsize.c.
+	 */
+	qpqual = NIL;
+	foreach(l, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (rinfo->pseudoconstant)
+			continue;			/* we may drop pseudoconstants here */
+		if (is_redundant_with_indexclauses(rinfo, indexclauses))
+			continue;			/* dup or derived from same EquivalenceClass */
+		if (!contain_mutable_functions((Node *) rinfo->clause) &&
+			predicate_implied_by(list_make1(rinfo->clause), stripped_indexquals,
+								 false))
+			continue;			/* provably implied by indexquals */
+		qpqual = lappend(qpqual, rinfo);
+	}
+
+	/* Sort clauses into best execution order */
+	qpqual = order_qual_clauses(root, qpqual);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	qpqual = extract_actual_clauses(qpqual, false);
+
+	/*
+	 * We have to replace any outer-relation variables with nestloop params in
+	 * the indexqualorig, qpqual, and indexorderbyorig expressions.  A bit
+	 * annoying to have to do this separately from the processing in
+	 * fix_indexqual_references --- rethink this when generalizing the inner
+	 * indexscan support.  But note we can't really do this earlier because
+	 * it'd break the comparisons to predicates above ... (or would it?  Those
+	 * wouldn't have outer refs)
+	 */
+	if (best_path->ipath.path.param_info)
+	{
+		stripped_indexquals = (List *)
+			replace_nestloop_params(root, (Node *) stripped_indexquals);
+		qpqual = (List *)
+			replace_nestloop_params(root, (Node *) qpqual);
+	}
+
+	/* Finally ready to build the plan node */
+	brinsort_plan = make_brinsort(tlist,
+								  qpqual,
+								  baserelid,
+								  indexoid,
+								  fixed_indexquals,
+								  stripped_indexquals,
+								  best_path->ipath.indexscandir);
+
+	Assert(list_length(pathkeys) == 1);
+
+	if (pathkeys != NIL)
+	{
+		/*
+		 * Compute sort column info, and adjust the Append's tlist as needed.
+		 * Because we pass adjust_tlist_in_place = true, we may ignore the
+		 * function result; it must be the same plan node.  However, we then
+		 * need to detect whether any tlist entries were added.
+		 */
+		(void) prepare_sort_from_pathkeys((Plan *) brinsort_plan, pathkeys,
+										  best_path->ipath.path.parent->relids,
+										  NULL,
+										  true,
+										  &brinsort_plan->numCols,
+										  &brinsort_plan->sortColIdx,
+										  &brinsort_plan->sortOperators,
+										  &brinsort_plan->collations,
+										  &brinsort_plan->nullsFirst);
+		//tlist_was_changed = (orig_tlist_length != list_length(plan->plan.targetlist));
+		for (int i = 0; i < brinsort_plan->numCols; i++)
+			elog(DEBUG1, "%d => %d %d %d %d", i,
+				 brinsort_plan->sortColIdx[i],
+				 brinsort_plan->sortOperators[i],
+				 brinsort_plan->collations[i],
+				 brinsort_plan->nullsFirst[i]);
+	}
+
+	copy_generic_path_info(&brinsort_plan->scan.plan, &best_path->ipath.path);
+
+	/*
+	 * Now lookup the index attnums for sort expressions.
+	 *
+	 * Determine index attnum we're interested in. sortColIdx is an index into
+	 * the target list, so we need to grab the expression and try to match it
+	 * to the index. The expression may be either plain Var (in which case we
+	 * match it to indkeys value), or an expression (in which case we match it
+	 * to indexprs).
+	 *
+	 * XXX We've already matched the sort key to the index, otherwise we would
+	 * not get here. So maybe we could just remember it, somehow? Also, we must
+	 * keep the decisions made in these two places consistent - if we fail to
+	 * match a sort key here (which we matched before), we have a problem.
+	 *
+	 * FIXME lock mode for index_open
+	 */
+	indexRel = index_open(indexoid, NoLock);
+	indexprs = RelationGetIndexExpressions(indexRel);
+
+	brinsort_plan->attnums
+		= (AttrNumber *) palloc0(sizeof(AttrNumber) * brinsort_plan->numCols);
+
+	for (int i = 0; i < brinsort_plan->numCols; i++)
+	{
+		TargetEntry *tle;
+		int			expridx = 0;	/* expression index */
+
+		tle = list_nth(brinsort_plan->scan.plan.targetlist,
+					   brinsort_plan->sortColIdx[i] - 1);	/* FIXME proper colidx */
+
+		/* find the index key matching the expression from the target entry */
+		for (int j = 0; j < indexRel->rd_index->indnatts; j++)
+		{
+			AttrNumber indkey = indexRel->rd_index->indkey.values[j];
+
+			if (AttributeNumberIsValid(indkey))
+			{
+				Var *var = (Var *) tle->expr;
+
+				if (!IsA(tle->expr, Var))
+					continue;
+
+				if (var->varattno == indkey)
+				{
+					brinsort_plan->attnums[i] = (j + 1);
+					break;
+				}
+			}
+			else
+			{
+				Node *expr = (Node *) list_nth(indexprs, expridx);
+
+				if (equal(expr, tle->expr))
+				{
+					brinsort_plan->attnums[i] = (j + 1);
+					break;
+				}
+
+				expridx++;
+			}
+		}
+	}
+
+	index_close(indexRel, NoLock);
+
+	return brinsort_plan;
+}
+
 /*
  * create_bitmap_scan_plan
  *	  Returns a bitmap scan plan for the base relation scanned by 'best_path'
@@ -5539,6 +5773,31 @@ make_indexscan(List *qptlist,
 	return node;
 }
 
+static BrinSort *
+make_brinsort(List *qptlist,
+			   List *qpqual,
+			   Index scanrelid,
+			   Oid indexid,
+			   List *indexqual,
+			   List *indexqualorig,
+			   ScanDirection indexscandir)
+{
+	BrinSort  *node = makeNode(BrinSort);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->indexid = indexid;
+	node->indexqual = indexqual;
+	node->indexqualorig = indexqualorig;
+	node->indexorderdir = indexscandir;
+
+	return node;
+}
+
 static IndexOnlyScan *
 make_indexonlyscan(List *qptlist,
 				   List *qpqual,
@@ -7175,6 +7434,7 @@ is_projection_capable_path(Path *path)
 		case T_Memoize:
 		case T_Sort:
 		case T_IncrementalSort:
+		case T_BrinSort:
 		case T_Unique:
 		case T_SetOp:
 		case T_LockRows:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5cc8366af66..ce7cbce02a3 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -708,6 +708,25 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 				return set_indexonlyscan_references(root, splan, rtoffset);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *splan = (BrinSort *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->indexqual =
+					fix_scan_list(root, splan->indexqual,
+								  rtoffset, 1);
+				splan->indexqualorig =
+					fix_scan_list(root, splan->indexqualorig,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *splan = (BitmapIndexScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d749b505785..478d56234a4 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1028,6 +1028,63 @@ create_index_path(PlannerInfo *root,
 	return pathnode;
 }
 
+
+/*
+ * create_brinsort_path
+ *	  Creates a path node for sorted brin sort scan.
+ *
+ * 'index' is a usable index.
+ * 'indexclauses' is a list of IndexClause nodes representing clauses
+ *			to be enforced as qual conditions in the scan.
+ * 'indexorderbys' is a list of bare expressions (no RestrictInfos)
+ *			to be used as index ordering operators in the scan.
+ * 'indexorderbycols' is an integer list of index column numbers (zero based)
+ *			the ordering operators can be used with.
+ * 'pathkeys' describes the ordering of the path.
+ * 'indexscandir' is ForwardScanDirection or BackwardScanDirection
+ *			for an ordered index, or NoMovementScanDirection for
+ *			an unordered index.
+ * 'indexonly' is true if an index-only scan is wanted.
+ * 'required_outer' is the set of outer relids for a parameterized path.
+ * 'loop_count' is the number of repetitions of the indexscan to factor into
+ *		estimates of caching behavior.
+ * 'partial_path' is true if constructing a parallel index scan path.
+ *
+ * Returns the new path node.
+ */
+BrinSortPath *
+create_brinsort_path(PlannerInfo *root,
+					 IndexOptInfo *index,
+					 List *indexclauses,
+					 List *pathkeys,
+					 ScanDirection indexscandir,
+					 bool indexonly,
+					 Relids required_outer,
+					 double loop_count,
+					 bool partial_path)
+{
+	BrinSortPath  *pathnode = makeNode(BrinSortPath);
+	RelOptInfo *rel = index->rel;
+
+	pathnode->ipath.path.pathtype = T_BrinSort;
+	pathnode->ipath.path.parent = rel;
+	pathnode->ipath.path.pathtarget = rel->reltarget;
+	pathnode->ipath.path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->ipath.path.parallel_aware = false;
+	pathnode->ipath.path.parallel_safe = rel->consider_parallel;
+	pathnode->ipath.path.parallel_workers = 0;
+	pathnode->ipath.path.pathkeys = pathkeys;
+
+	pathnode->ipath.indexinfo = index;
+	pathnode->ipath.indexclauses = indexclauses;
+	pathnode->ipath.indexscandir = indexscandir;
+
+	cost_brinsort(pathnode, root, loop_count, partial_path);
+
+	return pathnode;
+}
+
 /*
  * create_bitmap_heap_path
  *	  Creates a path node for a bitmap scan.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1d576343ecd..775d220fce9 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -101,6 +101,10 @@ extern bool debug_brin_stats;
 extern bool debug_brin_cross_check;
 #endif
 
+#ifdef DEBUG_BRIN_SORT
+extern bool debug_brin_sort;
+#endif
+
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
 #endif
@@ -1017,6 +1021,16 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_brinsort", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of BRIN sort plans."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_brinsort,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
@@ -1258,6 +1272,20 @@ struct config_bool ConfigureNamesBool[] =
 	},
 #endif
 
+#ifdef DEBUG_BRIN_SORT
+	/* this is undocumented because not exposed in a standard build */
+	{
+		{"debug_brin_sort", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Print info about BRIN sorting."),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&debug_brin_sort,
+		false,
+		NULL, NULL, NULL
+	},
+#endif
+
 	{
 		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Terminate session on any error."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 47e80ad150c..10b97e96bc8 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -371,6 +371,7 @@
 
 #enable_async_append = on
 #enable_bitmapscan = on
+#enable_brinsort = off
 #enable_gathermerge = on
 #enable_hashagg = on
 #enable_hashjoin = on
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 9ca9835aab6..750ee016fd7 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2574,6 +2574,18 @@ tuplesort_get_stats(Tuplesortstate *state,
 	}
 }
 
+/*
+ * tuplesort_reset_stats - reset summary statistics
+ *
+ * This can be called before tuplesort_performsort() starts.
+ */
+void
+tuplesort_reset_stats(Tuplesortstate *state)
+{
+	state->isMaxSpaceDisk = false;
+	state->maxSpace = 0;
+}
+
 /*
  * Convert TuplesortMethod to a string.
  */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index 1d21b816fcd..cdfa7421ae9 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,41 +34,6 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
-/*
- * Info about ranges for BRIN Sort.
- */
-typedef struct BrinRange
-{
-	BlockNumber blkno_start;
-	BlockNumber blkno_end;
-
-	Datum	min_value;
-	Datum	max_value;
-	bool	has_nulls;
-	bool	all_nulls;
-	bool	not_summarized;
-
-	/*
-	 * Index of the range when ordered by min_value (if there are multiple
-	 * ranges with the same min_value, it's the lowest one).
-	 */
-	uint32	min_index;
-
-	/*
-	 * Minimum min_index from all ranges with higher max_value (i.e. when
-	 * sorted by max_value). If there are multiple ranges with the same
-	 * max_value, it depends on the ordering (i.e. the ranges may get
-	 * different min_index_lowest, depending on the exact ordering).
-	 */
-	uint32	min_index_lowest;
-} BrinRange;
-
-typedef struct BrinRanges
-{
-	int			nranges;
-	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
-} BrinRanges;
-
 typedef struct BrinMinmaxStats
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index eac796e6f47..6dd3f3e3c11 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -76,6 +76,7 @@ typedef struct BrinDesc
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
 #define BRIN_PROCNUM_STATISTICS		11	/* optional */
+#define BRIN_PROCNUM_RANGES 		12	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 9bbd1f14f12..b499699b350 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -806,6 +806,8 @@
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -839,6 +841,8 @@
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -870,6 +874,8 @@
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -901,6 +907,8 @@
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -915,6 +923,8 @@
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -929,6 +939,8 @@
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1048,6 +1060,8 @@
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1078,6 +1092,8 @@
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1128,6 +1144,8 @@
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1181,6 +1199,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1197,6 +1218,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1288,6 +1312,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1344,6 +1371,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1398,6 +1428,8 @@
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1471,6 +1503,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1504,6 +1539,8 @@
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1557,6 +1594,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1573,6 +1613,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1587,6 +1630,8 @@
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1716,6 +1761,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1772,6 +1820,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1824,6 +1875,8 @@
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1841,6 +1894,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1858,6 +1914,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1912,6 +1971,8 @@
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1988,6 +2049,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c44784a0d07..84ec3259be9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5309,7 +5309,7 @@
   proname => 'pg_stat_get_numscans', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_numscans' },
-{ oid => '9976', descr => 'statistics: time of the last scan for table/index',
+{ oid => '8912', descr => 'statistics: time of the last scan for table/index',
   proname => 'pg_stat_get_lastscan', provolatile => 's', proparallel => 'r',
   prorettype => 'timestamptz', proargtypes => 'oid',
   prosrc => 'pg_stat_get_lastscan' },
@@ -8500,6 +8500,9 @@
   proname => 'brin_minmax_stats', prorettype => 'bool',
   proargtypes => 'internal internal int2 int2 internal int4',
   prosrc => 'brin_minmax_stats' },
+{ oid => '9801', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_ranges', prorettype => 'bool',
+  proargtypes => 'internal int2 bool', prosrc => 'brin_minmax_ranges' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/executor/nodeBrinSort.h b/src/include/executor/nodeBrinSort.h
new file mode 100644
index 00000000000..3cac599d811
--- /dev/null
+++ b/src/include/executor/nodeBrinSort.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeBrinSort.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODEBRIN_SORT_H
+#define NODEBRIN_SORT_H
+
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "nodes/execnodes.h"
+
+extern BrinSortState *ExecInitBrinSort(BrinSort *node, EState *estate, int eflags);
+extern void ExecEndBrinSort(BrinSortState *node);
+extern void ExecBrinSortMarkPos(BrinSortState *node);
+extern void ExecBrinSortRestrPos(BrinSortState *node);
+extern void ExecReScanBrinSort(BrinSortState *node);
+extern void ExecBrinSortEstimate(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortReInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeWorker(BrinSortState *node,
+										  ParallelWorkerContext *pwcxt);
+
+/*
+ * These routines are exported to share code with nodeIndexonlyscan.c and
+ * nodeBitmapBrinSort.c
+ */
+extern void ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
+								   List *quals, bool isorderby,
+								   ScanKey *scanKeys, int *numScanKeys,
+								   IndexRuntimeKeyInfo **runtimeKeys, int *numRuntimeKeys,
+								   IndexArrayKeyInfo **arrayKeys, int *numArrayKeys);
+extern void ExecIndexEvalRuntimeKeys(ExprContext *econtext,
+									 IndexRuntimeKeyInfo *runtimeKeys, int numRuntimeKeys);
+extern bool ExecIndexEvalArrayKeys(ExprContext *econtext,
+								   IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+extern bool ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+
+#endif							/* NODEBRIN_SORT_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 20f4c8b35f3..efe26938d0a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1565,6 +1565,114 @@ typedef struct IndexScanState
 	Size		iss_PscanLen;
 } IndexScanState;
 
+typedef enum {
+	BRINSORT_START,
+	BRINSORT_LOAD_RANGE,
+	BRINSORT_PROCESS_RANGE,
+	BRINSORT_LOAD_NULLS,
+	BRINSORT_PROCESS_NULLS,
+	BRINSORT_FINISHED
+} BrinSortPhase;
+
+typedef struct BrinRangeScanDesc
+{
+	/* range info tuple descriptor */
+	TupleDesc		tdesc;
+
+	/* ranges, sorted by minval, blkno_start */
+	Tuplesortstate *ranges;
+
+	/* number of ranges in the tuplesort */
+	int64			nranges;
+
+	/* distinct minval (sorted) */
+	Tuplestorestate *minvals;
+
+	/* slot for accessing the tuplesort/tuplestore */
+	TupleTableSlot  *slot;
+
+} BrinRangeScanDesc;
+
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinSortState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	ExprState  *indexqualorig;
+	List	   *indexorderbyorig;
+	struct ScanKeyData *iss_ScanKeys;
+	int			iss_NumScanKeys;
+	struct ScanKeyData *iss_OrderByKeys;
+	int			iss_NumOrderByKeys;
+	IndexRuntimeKeyInfo *iss_RuntimeKeys;
+	int			iss_NumRuntimeKeys;
+	bool		iss_RuntimeKeysReady;
+	ExprContext *iss_RuntimeContext;
+	Relation	iss_RelationDesc;
+	struct IndexScanDescData *iss_ScanDesc;
+
+	/* These are needed for re-checking ORDER BY expr ordering */
+	pairingheap *iss_ReorderQueue;
+	bool		iss_ReachedEnd;
+	Datum	   *iss_OrderByValues;
+	bool	   *iss_OrderByNulls;
+	SortSupport iss_SortSupport;
+	bool	   *iss_OrderByTypByVals;
+	int16	   *iss_OrderByTypLens;
+	Size		iss_PscanLen;
+
+	/* */
+	BrinRangeScanDesc *bs_scan;
+	BrinRange	   *bs_range;
+	ExprState	   *bs_qual;
+	Datum			bs_watermark;
+	bool			bs_watermark_set;
+	bool			bs_watermark_empty;
+	BrinSortPhase	bs_phase;
+	SortSupportData	bs_sortsupport;
+	ProjectionInfo *bs_ProjInfo;
+
+	/*
+	 * We need two tuplesort instances - one for current range, one for
+	 * spill-over tuples from the overlapping ranges
+	 */
+	void		   *bs_tuplesortstate;
+	Tuplestorestate *bs_tuplestore;
+} BrinSortState;
+
 /* ----------------
  *	 IndexOnlyScanState information
  *
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d61a62da196..3d93b2ac714 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1690,6 +1690,17 @@ typedef struct IndexPath
 	Selectivity indexselectivity;
 } IndexPath;
 
+/*
+ * read sorted data from brin index
+ *
+ * We use IndexPath, because that's what amcostestimate is expecting, but
+ * we typedef it as a separate struct.
+ */
+typedef struct BrinSortPath
+{
+	IndexPath	ipath;
+} BrinSortPath;
+
 /*
  * Each IndexClause references a RestrictInfo node from the query's WHERE
  * or JOIN conditions, and shows how that restriction can be applied to
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 659bd05c0c1..341dfc57826 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -501,6 +501,38 @@ typedef struct IndexOnlyScan
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
 } IndexOnlyScan;
 
+/*
+ * XXX Does it make sense (is it possible) to have a sort by more than one
+ * column, using a BRIN index?
+ */
+typedef struct BrinSort
+{
+	Scan		scan;
+	Oid			indexid;		/* OID of index to scan */
+	List	   *indexqual;		/* list of index quals (usually OpExprs) */
+	List	   *indexqualorig;	/* the same in original form */
+	ScanDirection indexorderdir;	/* forward or backward or don't care */
+
+	/* number of sort-key columns */
+	int			numCols;
+
+	/* attnums in the index */
+	AttrNumber *attnums pg_node_attr(array_size(numCols));
+
+	/* their indexes in the target list */
+	AttrNumber *sortColIdx pg_node_attr(array_size(numCols));
+
+	/* OIDs of operators to sort them by */
+	Oid		   *sortOperators pg_node_attr(array_size(numCols));
+
+	/* OIDs of collations */
+	Oid		   *collations pg_node_attr(array_size(numCols));
+
+	/* NULLS FIRST/LAST directions */
+	bool	   *nullsFirst pg_node_attr(array_size(numCols));
+
+} BrinSort;
+
 /* ----------------
  *		bitmap index scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6cf49705d3a..1fed43645a7 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -70,6 +70,7 @@ extern PGDLLIMPORT bool enable_parallel_hash;
 extern PGDLLIMPORT bool enable_partition_pruning;
 extern PGDLLIMPORT bool enable_presorted_aggregate;
 extern PGDLLIMPORT bool enable_async_append;
+extern PGDLLIMPORT bool enable_brinsort;
 extern PGDLLIMPORT int constraint_exclusion;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
@@ -80,6 +81,8 @@ extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 							ParamPathInfo *param_info);
 extern void cost_index(IndexPath *path, PlannerInfo *root,
 					   double loop_count, bool partial_path);
+extern void cost_brinsort(BrinSortPath *path, PlannerInfo *root,
+						  double loop_count, bool partial_path);
 extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 								  ParamPathInfo *param_info,
 								  Path *bitmapqual, double loop_count);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 69be701b167..03ecc988001 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -49,6 +49,15 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									Relids required_outer,
 									double loop_count,
 									bool partial_path);
+extern BrinSortPath *create_brinsort_path(PlannerInfo *root,
+									IndexOptInfo *index,
+									List *indexclauses,
+									List *pathkeys,
+									ScanDirection indexscandir,
+									bool indexonly,
+									Relids required_outer,
+									double loop_count,
+									bool partial_path);
 extern BitmapHeapPath *create_bitmap_heap_path(PlannerInfo *root,
 											   RelOptInfo *rel,
 											   Path *bitmapqual,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 736d78ea4c3..3e1c9457629 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -212,6 +212,9 @@ extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,
 extern Path *get_cheapest_parallel_safe_total_inner(List *paths);
 extern List *build_index_pathkeys(PlannerInfo *root, IndexOptInfo *index,
 								  ScanDirection scandir);
+extern List *build_index_pathkeys_brin(PlannerInfo *root, IndexOptInfo *index,
+								  TargetEntry *tle, int idx,
+								  bool reverse_sort, bool nulls_first);
 extern List *build_partition_pathkeys(PlannerInfo *root, RelOptInfo *partrel,
 									  ScanDirection scandir, bool *partialkeys);
 extern List *build_expression_pathkey(PlannerInfo *root, Expr *expr,
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 12578e42bc3..45413dac1a5 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -367,6 +367,7 @@ extern void tuplesort_reset(Tuplesortstate *state);
 
 extern void tuplesort_get_stats(Tuplesortstate *state,
 								TuplesortInstrumentation *stats);
+extern void tuplesort_reset_stats(Tuplesortstate *state);
 extern const char *tuplesort_method_name(TuplesortMethod m);
 extern const char *tuplesort_space_type_name(TuplesortSpaceType t);
 
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index b7fda6fc828..308e912c21c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
 --------------------------------+---------
  enable_async_append            | on
  enable_bitmapscan              | on
+ enable_brinsort                | on
  enable_gathermerge             | on
  enable_hashagg                 | on
  enable_hashjoin                | on
@@ -133,7 +134,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(22 rows)
+(23 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.39.1

0005-wip-brinsort-explain-stats-20230218-2.patchtext/x-patch; charset=UTF-8; name=0005-wip-brinsort-explain-stats-20230218-2.patchDownload

From 26609a05ea474ce0aef8decb4ecca877e37f0d14 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 22:01:01 +0200
Subject: [PATCH 05/11] wip: brinsort explain stats

Show some internal stats about BRIN Sort in EXPLAIN output.
---
 src/backend/commands/explain.c      | 132 ++++++++++++++++++++++++++++
 src/backend/executor/nodeBrinSort.c |  66 +++++++++++---
 src/include/nodes/execnodes.h       |  39 ++++++++
 3 files changed, 223 insertions(+), 14 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 153e41b856f..1305083f1d3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -87,6 +87,8 @@ static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
 static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
 							   ExplainState *es);
+static void show_brinsort_stats(BrinSortState *sortstate, List *ancestors,
+								ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1814,6 +1816,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			show_brinsort_stats(castNode(BrinSortState, planstate), ancestors, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
@@ -2432,6 +2435,135 @@ show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 						 ancestors, es);
 }
 
+static void
+show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSortStats  *stats = &sortstate->bs_stats;
+
+	if (sortstate->bs_scan != NULL &&
+		sortstate->bs_scan->ranges != NULL)
+	{
+		TuplesortInstrumentation stats;
+
+		memset(&stats, 0, sizeof(TuplesortInstrumentation));
+		tuplesort_get_stats(sortstate->bs_scan->ranges, &stats);
+
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Ranges: " INT64_FORMAT "  Build time: " INT64_FORMAT "  Method: %s  Space: " INT64_FORMAT " kB (%s)\n",
+						 sortstate->bs_scan->nranges,
+						 sortstate->bs_stats.ranges_build_ms,
+						 tuplesort_method_name(stats.sortMethod),
+						 stats.spaceUsed,
+						 tuplesort_space_type_name(stats.spaceType));
+	}
+
+	if (stats->sort_count > 0)
+	{
+		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
+							   stats->range_count, es);
+
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainPropertyInteger("Sorts", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Tuples Sorted: " INT64_FORMAT "  Per-sort: " INT64_FORMAT  "  Direct: " INT64_FORMAT "  Spilled: " INT64_FORMAT "  Respilled: " INT64_FORMAT "\n",
+							 stats->ntuples_tuplesort_all,
+							 stats->ntuples_tuplesort_all / stats->sort_count,
+							 stats->ntuples_tuplesort_direct,
+							 stats->ntuples_spilled,
+							 stats->ntuples_respilled);
+		}
+		else
+		{
+			ExplainOpenGroup("Sorts", "Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainPropertyInteger("Tuples per sort", NULL, (int64)
+								   stats->ntuples_tuplesort_all / stats->sort_count, es);
+
+			ExplainPropertyInteger("Sorted tuples (all)", NULL, (int64)
+								   stats->ntuples_tuplesort_all, es);
+
+			ExplainPropertyInteger("Sorted tuples (direct)", NULL, (int64)
+								   stats->ntuples_tuplesort_direct, es);
+
+			ExplainPropertyInteger("Spilled tuples", NULL, (int64)
+								   stats->ntuples_spilled, es);
+
+			ExplainPropertyInteger("Respilled tuples", NULL, (int64)
+								   stats->ntuples_respilled, es);
+
+			ExplainCloseGroup("Sorts", "Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_in_memory > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (in-memory)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_in_memory,
+							 stats->total_space_used_in_memory,
+							 stats->max_space_used_in_memory,
+							 stats->total_space_used_in_memory / stats->sort_count_in_memory);
+		}
+		else
+		{
+			ExplainOpenGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_in_memory / stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_in_memory, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_in_memory, es);
+
+			ExplainCloseGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_on_disk > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (on-disk)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_on_disk,
+							 stats->total_space_used_on_disk,
+							 stats->max_space_used_on_disk,
+							 stats->total_space_used_on_disk / stats->sort_count_on_disk);
+		}
+		else
+		{
+			ExplainOpenGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_on_disk / stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_on_disk, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_on_disk, es);
+
+			ExplainCloseGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+		}
+	}
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 9505eafc548..614d28c83b1 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -450,6 +450,8 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
 		return;
 
+	node->bs_stats.range_count++;
+
 	brinsort_start_tidscan(node);
 
 	scan = node->ss.ss_currentScanDesc;
@@ -534,7 +536,10 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 				/* Stash it to the tuplestore (when NULL, or ignore
 				 * it (when not-NULL). */
 				if (isnull)
+				{
 					tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+					node->bs_stats.ntuples_spilled++;
+				}
 
 				/* NULL or not, we're done */
 				continue;
@@ -554,7 +559,12 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 										  &node->bs_sortsupport);
 
 			if (cmp <= 0)
+			{
 				tuplesort_puttupleslot(node->bs_tuplesortstate, tmpslot);
+				node->bs_stats.ntuples_tuplesort_direct++;
+				node->bs_stats.ntuples_tuplesort_all++;
+				node->bs_stats.ntuples_tuplesort++;
+			}
 			else
 			{
 				/*
@@ -565,6 +575,7 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 				 * respill) and stop spilling.
 				 */
 				tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+				node->bs_stats.ntuples_spilled++;
 			}
 		}
 
@@ -633,7 +644,11 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 									  &node->bs_sortsupport);
 
 		if (cmp <= 0)
+		{
 			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+			node->bs_stats.ntuples_tuplesort_all++;
+			node->bs_stats.ntuples_tuplesort++;
+		}
 		else
 		{
 			/*
@@ -644,6 +659,7 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 			 * respill) and stop spilling.
 			 */
 			tuplestore_puttupleslot(tupstore, slot);
+			node->bs_stats.ntuples_respilled++;
 		}
 	}
 
@@ -933,23 +949,40 @@ IndexNext(BrinSortState *node)
 					 */
 					if (node->bs_tuplesortstate)
 					{
-#ifdef DEBUG_BRIN_SORT
+						TuplesortInstrumentation stats;
+
+						/*
+						 * Reset tuplesort statistics between runs, otherwise
+						 * we'll keep re-using stats from the largest run.
+						 */
 						tuplesort_reset_stats(node->bs_tuplesortstate);
-#endif
 
 						tuplesort_performsort(node->bs_tuplesortstate);
 
-#ifdef DEBUG_BRIN_SORT
-						if (debug_brin_sort)
-						{
-							TuplesortInstrumentation stats;
+						node->bs_stats.sort_count++;
+						node->bs_stats.ntuples_tuplesort = 0;
 
-							memset(&stats, 0, sizeof(TuplesortInstrumentation));
-							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
 
-							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+						if (stats.spaceType == SORT_SPACE_TYPE_DISK)
+						{
+							node->bs_stats.sort_count_on_disk++;
+							node->bs_stats.total_space_used_on_disk += stats.spaceUsed;
+							node->bs_stats.max_space_used_on_disk = Max(node->bs_stats.max_space_used_on_disk,
+																		stats.spaceUsed);
+						}
+						else if (stats.spaceType == SORT_SPACE_TYPE_MEMORY)
+						{
+							node->bs_stats.sort_count_in_memory++;
+							node->bs_stats.total_space_used_in_memory += stats.spaceUsed;
+							node->bs_stats.max_space_used_in_memory = Max(node->bs_stats.max_space_used_in_memory,
+																		  stats.spaceUsed);
+						}
 
-							elog(WARNING, "method: %s  space: %ld kB (%s)",
+#ifdef DEBUG_BRIN_SORT
+						if (debug_brin_sort)
+						{
+							elog(WARNING, "method: %s  space: " INT64_FORMAT " kB (%s)",
 								 tuplesort_method_name(stats.sortMethod),
 								 stats.spaceUsed,
 								 tuplesort_space_type_name(stats.spaceType));
@@ -1219,9 +1252,10 @@ ExecEndBrinSort(BrinSortState *node)
 		tuplesort_end(node->bs_tuplesortstate);
 	node->bs_tuplesortstate = NULL;
 
-	if (node->bs_scan->ranges != NULL)
+	if (node->bs_scan != NULL &&
+		node->bs_scan->ranges != NULL)
 		tuplesort_end(node->bs_scan->ranges);
-	node->bs_scan->ranges = NULL;
+	node->bs_scan = NULL;
 }
 
 /* ----------------------------------------------------------------
@@ -1311,6 +1345,7 @@ ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
 	FmgrInfo   *rangeproc;
 	BrinRangeScanDesc *brscan;
 	bool		asc;
+	TimestampTz	start_ts;
 
 	/* BRIN Sort only allows ORDER BY using a single column */
 	Assert(node->numCols == 1);
@@ -1355,15 +1390,18 @@ ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
 
 	/*
 	 * Ask the opclass to produce ranges in appropriate ordering.
-	 *
-	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
 	 */
+	start_ts = GetCurrentTimestamp();
+
 	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
 											node->collations[0],
 											PointerGetDatum(scan),
 											Int16GetDatum(attno),
 											BoolGetDatum(asc)));
 
+	planstate->bs_stats.ranges_build_ms
+		= TimestampDifferenceMilliseconds(start_ts, GetCurrentTimestamp());
+
 	/* allocate for space, and also for the alternative ordering */
 	planstate->bs_scan = brscan;
 }
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index efe26938d0a..2a98286e11a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1628,6 +1628,44 @@ typedef struct BrinRanges
 	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
 } BrinRanges;
 
+typedef struct BrinSortStats
+{
+	/* number of sorts */
+	int64	sort_count;
+
+	/* number of ranges loaded */
+	int64	range_count;
+
+	/* tuples in the current tuplesort */
+	int64	ntuples_tuplesort;
+
+	/* tuples written directly to tuplesort */
+	int64	ntuples_tuplesort_direct;
+
+	/* tuples written to tuplesort (all) */
+	int64	ntuples_tuplesort_all;
+
+	/* tuples written to tuplestore */
+	int64	ntuples_spilled;
+
+	/* tuples copied from old to new tuplestore */
+	int64	ntuples_respilled;
+
+	/* number of in-memory/on-disk sorts */
+	int64	sort_count_in_memory;
+	int64	sort_count_on_disk;
+
+	/* total/maximum amount of space used by either sort */
+	int64	total_space_used_in_memory;
+	int64	total_space_used_on_disk;
+	int64	max_space_used_in_memory;
+	int64	max_space_used_on_disk;
+
+	/* time to build ranges (milliseconds) */
+	int64	ranges_build_ms;
+
+} BrinSortStats;
+
 typedef struct BrinSortState
 {
 	ScanState	ss;				/* its first field is NodeTag */
@@ -1664,6 +1702,7 @@ typedef struct BrinSortState
 	BrinSortPhase	bs_phase;
 	SortSupportData	bs_sortsupport;
 	ProjectionInfo *bs_ProjInfo;
+	BrinSortStats	bs_stats;
 
 	/*
 	 * We need two tuplesort instances - one for current range, one for
-- 
2.39.1

0006-wip-multiple-watermark-steps-20230218-2.patchtext/x-patch; charset=UTF-8; name=0006-wip-multiple-watermark-steps-20230218-2.patchDownload

From 48c7109562fac3cbca779d23483304e20dea2413 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 20 Oct 2022 13:03:00 +0200
Subject: [PATCH 06/11] wip: multiple watermark steps

Allow incrementing the minval watermark faster, by skipping some minval
values. This allows sorting more data at once (instead of many tiny
sorts, which is inefficient). This also reduces the number of rows we
need to spill (and possibly transfer multiple times).

To use a different watermark step, use a new GUC:

  SET brinsort_watermark_step = 16
---
 src/backend/commands/explain.c      |  3 ++
 src/backend/executor/nodeBrinSort.c | 59 ++++++++++++++++++++++++++---
 src/backend/utils/misc/guc_tables.c | 12 ++++++
 3 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1305083f1d3..a89ee03857d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -47,6 +47,7 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
 /* Hook for plugins to get control in explain_get_index_name() */
 explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
 
+extern int brinsort_watermark_step;
 
 /* OR-able flags for ExplainXMLTag() */
 #define X_OPENING 0
@@ -2457,6 +2458,8 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 						 tuplesort_space_type_name(stats.spaceType));
 	}
 
+	ExplainPropertyInteger("Step", NULL, (int64) brinsort_watermark_step, es);
+
 	if (stats->sort_count > 0)
 	{
 		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 614d28c83b1..15d15797735 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -248,6 +248,14 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 1;
+
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -351,11 +359,24 @@ brinsort_end_tidscan(BrinSortState *node)
  * a separate "first" parameter - "set=false" has the same meaning.
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool asc)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
 {
 	int		cmp;
+
+	/* assume we haven't found a watermark */
 	bool	found = false;
 
+	Assert(steps > 0);
+
+	/*
+	 * If the watermark is empty, either this is the first call (in
+	 * which case we just use the first (or rather second) value.
+	 * Otherwise it means we've reached the end, so no point in looking
+	 * for more watermarks.
+	 */
+	if (node->bs_watermark_empty && !first)
+		return;
+
 	tuplesort_markpos(node->bs_scan->ranges);
 
 	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
@@ -381,22 +402,48 @@ brinsort_update_watermark(BrinSortState *node, bool asc)
 		else
 			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
 
+		/*
+		 * Has to be the first call (otherwise we would not get here, because we
+		 * terminate after bs_watermark_set gets flipped back to false), so we
+		 * just set the value. But we don't count this as a step, because that
+		 * just picks the first minval value, as we certainly need to do at least
+		 * one more step.
+		 *
+		 * XXX Actually, do we need to make another step? Maybe there are enough
+		 * not-summarized ranges? Although, we don't know what values are in
+		 * those, ranges, and with increasing data we might easily end up just
+		 * writing all of it into the spill tuplestore. So making one more step
+		 * seems like a better idea - we'll at least be able to produce something
+		 * which is good for LIMIT queries.
+		 */
 		if (!node->bs_watermark_set)
 		{
+			Assert(first);
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
+			found = true;
 			continue;
 		}
 
 		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
 								  &node->bs_sortsupport);
 
-		if (cmp < 0)
+		/*
+		 * Values should not decrease (or whatever the operator says, might
+		 * be a DESC sort).
+		 */
+		Assert(cmp <= 0);
+
+		if (cmp < 0)	/* new watermark value */
 		{
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
 			found = true;
-			break;
+
+			steps--;
+
+			if (steps == 0)
+				break;
 		}
 	}
 
@@ -913,7 +960,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -1034,7 +1081,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -1059,7 +1106,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, asc);
+								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 775d220fce9..c36b0175344 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -95,6 +95,7 @@ extern char *temp_tablespaces;
 extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
 extern bool synchronize_seqscans;
+extern int	brinsort_watermark_step;
 
 #ifdef DEBUG_BRIN_STATS
 extern bool debug_brin_stats;
@@ -3533,6 +3534,17 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"brinsort_watermark_step", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("sets the step for brinsort watermark increments"),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&brinsort_watermark_step,
+		1, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
-- 
2.39.1

0007-wip-adjust-watermark-step-20230218-2.patchtext/x-patch; charset=UTF-8; name=0007-wip-adjust-watermark-step-20230218-2.patchDownload

From cee0215c939e973065ee3bd150b049deee738f48 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 00:06:28 +0200
Subject: [PATCH 07/11] wip: adjust watermark step

Look at available statistics - number of possible watermark values,
number of rows, work_mem, etc. and pick a good watermark_step value.

To calculate step using statistics, set the GUC to 0:

   SET brinsort_watermark_step = 0;
---
 src/backend/commands/explain.c          |  6 +++
 src/backend/executor/nodeBrinSort.c     | 21 ++++----
 src/backend/optimizer/plan/createplan.c | 70 +++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c     |  2 +-
 src/include/nodes/execnodes.h           |  5 ++
 src/include/nodes/plannodes.h           |  3 ++
 6 files changed, 94 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a89ee03857d..7cf42a7649f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2440,6 +2440,7 @@ static void
 show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 {
 	BrinSortStats  *stats = &sortstate->bs_stats;
+	BrinSort   *plan = (BrinSort *) sortstate->ss.ps.plan;
 
 	if (sortstate->bs_scan != NULL &&
 		sortstate->bs_scan->ranges != NULL)
@@ -2462,6 +2463,9 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 
 	if (stats->sort_count > 0)
 	{
+		ExplainPropertyInteger("Average Step", NULL, (int64)
+							   stats->watermark_updates_steps / stats->watermark_updates_count, es);
+
 		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
 							   stats->range_count, es);
 
@@ -2503,6 +2507,8 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 			ExplainCloseGroup("Sorts", "Sorts", true, es);
 		}
 	}
+	else
+		ExplainPropertyInteger("Initial Step", NULL, (int64) plan->watermark_step, es);
 
 	if (stats->sort_count_in_memory > 0)
 	{
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 15d15797735..f8356202b77 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -248,14 +248,6 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
-/*
- * How many distinct minval values to look forward for the next watermark?
- *
- * The smallest step we can do is 1, which means the immediately following
- * (while distinct) minval.
- */
-int brinsort_watermark_step = 1;
-
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -359,9 +351,11 @@ brinsort_end_tidscan(BrinSortState *node)
  * a separate "first" parameter - "set=false" has the same meaning.
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc)
 {
 	int		cmp;
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	int			steps = plan->watermark_step;
 
 	/* assume we haven't found a watermark */
 	bool	found = false;
@@ -449,6 +443,9 @@ brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
 
 	tuplesort_restorepos(node->bs_scan->ranges);
 
+	node->bs_stats.watermark_updates_count++;
+	node->bs_stats.watermark_updates_steps += plan->watermark_step;
+
 	node->bs_watermark_empty = (!found);
 }
 
@@ -960,7 +957,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, true, asc);
 				}
 
 				break;
@@ -1081,7 +1078,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, false, asc);
 				}
 
 				break;
@@ -1106,7 +1103,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+								brinsort_update_watermark(node, true, asc);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 2a7be2c4891..01cfe3f5b58 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -18,6 +18,7 @@
 
 #include <math.h>
 
+#include "access/brin.h"
 #include "access/genam.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
@@ -323,6 +324,14 @@ static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
 											 GatherMergePath *best_path);
 
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 0;
+
 /*
  * create_plan
  *	  Creates the access plan for a query by recursively processing the
@@ -3416,6 +3425,67 @@ create_brinsort_plan(PlannerInfo *root,
 
 	index_close(indexRel, NoLock);
 
+	/*
+	 * determine watermark step (how fast to advance)
+	 *
+	 * If the brinsort_watermark_step is set to a non-zero value, we just use
+	 * that value directly. Otherwise we pick a value using some simple
+	 * heuristics - we don't want the rows to exceed work_mem, and we leave
+	 * a bit slack (because we're adding batches of rows, not row by row).
+	 *
+	 * This has a weakness, because it assumes we incrementally add the same
+	 * number of rows into the "sort" set - but imagine very wide overlapping
+	 * ranges (e.g. random data on the same domain). Most of them will have
+	 * about the same minval, so the sort grows only very slowly. Until the
+	 * very last range, that removes the watermark and only then do most of
+	 * the rows get to the tuplesort.
+	 *
+	 * XXX But maybe we can look at the other statistics we have, like number
+	 * of overlaps and average range selectivity (% of tuples matching), and
+	 * deduce something from that?
+	 *
+	 * XXX Could we maybe adjust the watermark step adaptively at runtime?
+	 * That is, when we get to the "sort" step, maybe check how many rows
+	 * are there, and if there are only few then try increasing the step?
+	 */
+	brinsort_plan->watermark_step = brinsort_watermark_step;
+
+	if (brinsort_plan->watermark_step == 0)
+	{
+		BrinMinmaxStats *amstats;
+
+		/**/
+		Cardinality		rows = brinsort_plan->scan.plan.plan_rows;
+
+		/* estimate rowsize in the tuplesort */
+		int				width = brinsort_plan->scan.plan.plan_width;
+		int				tupwidth = (MAXALIGN(width) + MAXALIGN(SizeofHeapTupleHeader));
+
+		/* Don't overflow work_mem (use only half to absorb variations. */
+		int				maxrows = (work_mem * 1024L / tupwidth / 2);
+
+		/* If this is a LIMIT query, aim only for the required number of rows. */
+		if (root->limit_tuples > 0)
+			maxrows = Min(maxrows, root->limit_tuples);
+
+		/* Use the attnum calculated above. */
+		amstats = (BrinMinmaxStats *) get_attindexam(brinsort_plan->indexid,
+													 brinsort_plan->attnums[0]);
+
+		if (amstats)
+		{
+			double	pct_per_step = Max(amstats->minval_increment_avg,
+									   amstats->maxval_increment_avg);
+			double	rows_per_step = Max(1.0, pct_per_step * rows);
+
+			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
+		}
+
+		/* some rough safety estimates */
+		brinsort_plan->watermark_step = Max(brinsort_plan->watermark_step, 1);
+		brinsort_plan->watermark_step = Min(brinsort_plan->watermark_step, 8192);
+	}
+
 	return brinsort_plan;
 }
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c36b0175344..1bc39b37606 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3541,7 +3541,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_NOT_IN_SAMPLE
 		},
 		&brinsort_watermark_step,
-		1, 1, INT_MAX,
+		0, 0, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 2a98286e11a..06dc6416d99 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1664,6 +1664,10 @@ typedef struct BrinSortStats
 	/* time to build ranges (milliseconds) */
 	int64	ranges_build_ms;
 
+	/* number/sum of watermark update steps */
+	int64	watermark_updates_steps;
+	int64	watermark_updates_count;
+
 } BrinSortStats;
 
 typedef struct BrinSortState
@@ -1696,6 +1700,7 @@ typedef struct BrinSortState
 	BrinRangeScanDesc *bs_scan;
 	BrinRange	   *bs_range;
 	ExprState	   *bs_qual;
+	int				bs_watermark_step;
 	Datum			bs_watermark;
 	bool			bs_watermark_set;
 	bool			bs_watermark_empty;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 341dfc57826..659a6d110ee 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -531,6 +531,9 @@ typedef struct BrinSort
 	/* NULLS FIRST/LAST directions */
 	bool	   *nullsFirst pg_node_attr(array_size(numCols));
 
+	/* number of watermark steps to make */
+	int			watermark_step;
+
 } BrinSort;
 
 /* ----------------
-- 
2.39.1

0008-wip-adaptive-watermark-step-20230218-2.patchtext/x-patch; charset=UTF-8; name=0008-wip-adaptive-watermark-step-20230218-2.patchDownload

From d52ae732a67c52a8928c089cb7f7b181094aa8ea Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 01:39:39 +0200
Subject: [PATCH 08/11] wip: adaptive watermark step

Another option it to adjust the watermark step based on past tuplesort
executions, and either increase or decrease the step, based on whether
the sort was in-memory or on-disk, etc.

To do this, set the GUC to -1:

  SET brinsort_watermark_step = -1;
---
 src/backend/access/brin/brin_minmax.c   |   7 +-
 src/backend/executor/nodeBrinSort.c     | 189 +++++++++++++++++++++++-
 src/backend/optimizer/plan/createplan.c |  21 +--
 src/backend/utils/misc/guc_tables.c     |   2 +-
 src/include/nodes/execnodes.h           |   2 +-
 src/include/nodes/plannodes.h           |   3 +-
 6 files changed, 206 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 1181c9e7388..cfa56986e2b 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -47,9 +47,6 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
 
-/* print info about ranges */
-#define BRINSORT_DEBUG
-
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1989,7 +1986,7 @@ brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan, TupleTableSlot *slot,
 	scan->nranges++;
 }
 
-#ifdef BRINSORT_DEBUG
+#ifdef BRIN_SORT_DEBUG
 /*
  * brin_minmax_scan_next
  *		Return the next BRIN range information from the tuplestore.
@@ -2198,7 +2195,7 @@ brin_minmax_ranges(PG_FUNCTION_ARGS)
 	/* do the sort and any necessary post-processing */
 	brin_minmax_scan_finalize(brscan);
 
-#ifdef BRINSORT_DEBUG
+#ifdef BRIN_SORT_DEBUG
 	brin_minmax_scan_dump(brscan);
 #endif
 
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index f8356202b77..08507f2b5d9 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -218,6 +218,8 @@
  *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
  *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
  */
+#include <math.h>
+
 #include "postgres.h"
 
 #include "access/brin.h"
@@ -248,6 +250,14 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int	brinsort_watermark_step = 0;
+
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -859,6 +869,175 @@ brinsort_rescan(BrinSortState *node)
 	tuplesort_rescan(node->bs_scan->ranges);
 }
 
+/*
+ * Look at the tuplesort statistics, and maybe increase or decrease the
+ * watermark step. If the last sort was in-memory, we decrease the step.
+ * If the sort was in-memory, but we used less than work_mem/3, increment
+ * the step value.
+ *
+ * XXX This should probably behave differently for LIMIT queries, so that
+ * we don't load too many rows unnecessarily. We already consider that in
+ * create_brinsort_plan, but maybe we should limit increments to the step
+ * value here too - say, by tracking how many rows are we supposed to
+ * produce, and limiting the watermark so that we don't process too many
+ * rows in future steps.
+ *
+ * XXX We might also track the number of rows in the sort and space used,
+ * to calculate more accurate estimate of row width. And then use that to
+ * calculate number of rows that fit into work_mem. But the number of rows
+ * that go into tuplesort (per range) added would still remain fairly
+ * inaccurate, so not sure how good this woud be.
+ */
+static void
+brinsort_adjust_watermark_step(BrinSortState *node, TuplesortInstrumentation *stats)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+
+	if (brinsort_watermark_step != -1)
+		return;
+
+	if (stats->spaceType == SORT_SPACE_TYPE_DISK)
+	{
+		/*
+		 * We don't know how much to decrease the step (hard to estimate
+		 * due to space needed for in-memory and on-disk sorts is not
+		 * easily comparable, so we just cut the step in half. For the
+		 * in-memory sort, we then can do better estimate and increase
+		 * the step more accurately.
+		 */
+		plan->watermark_step = Max(1, plan->watermark_step / 2);
+	}
+	else
+	{
+		/*
+		 * Adjust the step based on the last sort - we shoot for 2/3 of
+		 * work_mem, to keep some slack (and not switch to on-disk sort
+		 * due to minor differences). We calculate the average row width
+		 * using space used and number of rows in the tuplesort, number
+		 * of rows we could fit into work_mem, and how many steps would
+		 * that mean (assuming number of rows is proportional to the
+		 * number of steps).
+		 *
+		 * We need to be careful about the number of rows we're supposed
+		 * to produce (and how many we already produced). Consider for
+		 * example a query with LIMIT 1000, and that we produce 999 rows
+		 * in the first sort, so that we need only 1 more row. It would
+		 * be silly to pick the steps with the goal to "fill work_mem"
+		 * instead of just enough to produce the one row.
+		 *
+		 * XXX In principle, we don't know how many rows will need to be
+		 * read from the table - there may be interesting rows already in
+		 * the tuplestore (in which case we could do a smaller step). But
+		 * we don't know how many such rows are there - maybe if we had
+		 * multiple smaller tuplestores, which would also reduce the
+		 * amount of "respill" we need to do.
+		 */
+		int		nrows_remaining;
+		int		step = plan->watermark_step;
+		int		step_max = plan->watermark_step * 2;
+
+		/* number of remaining rows we're expected to produce */
+		nrows_remaining = Max(1.0, plan->step_maxrows - node->bs_stats.ntuples_tuplesort_all);
+
+		/*
+		 * If we sorted any rows, calculate how many similar rows we can fit
+		 * into work_mem. We restrict ourselves to 2/3 of work_mem, to leave
+		 * a bit of slack space.
+		 *
+		 * XXX Hopefully the average width is somewhat accurate, but maybe
+		 * we should remember the width we originally expected, and combine
+		 * that somehow. Maybe we should not use just the last tuplesort,
+		 * but instead accumulate average from all preceding sorts and
+		 * combine them somehow (say, using weighted average with older
+		 * values having less influence).
+		 */
+		if (node->bs_stats.ntuples_tuplesort > 0)
+		{
+			int		nrows_wmem;
+			int		avgwidth;
+
+			/* average tuple width, calculated from last sort */
+			avgwidth = (stats->spaceUsed * 1024L / node->bs_stats.ntuples_tuplesort);
+
+			/*
+			 * Calculate the numer of rows to fit into 2/3 of work_mem, but
+			 * cap to the number of rows we're expected to produce.
+			 */
+			nrows_wmem = Min(nrows_remaining, (2 * 1024L * work_mem / 3) / avgwidth);
+
+			/* scale the number of steps to produce the number of rows */
+			step = step * ((double) (nrows_wmem * avgwidth) / (stats->spaceUsed * 1024L));
+
+			/* remember this as the max, so that we don't overflow work_mem */
+			step_max = Min(step, step_max);
+
+			/* however, make sure we don't grow too fast - cap to 2x */
+			step = Min(step, step_max);
+		}
+
+		/*
+		 * Now calculate average step size using data from all sorts we did
+		 * up to now. Then we calculate the number of steps we expect to be
+		 * necessary.
+		 *
+		 * If we had calculated average number of rows per step from AM stats,
+		 * consider that too. It's possible the batch had just one row, which
+		 * might result in very high estimate of steps - it'd be silly to
+		 * jump e.g. from 1 to 1000 based on this unreliable statistics. To
+		 * prevent that, we combine the two rows_per_step sources as weighted
+		 * sum, using the observed vs. target number of rows as weight. The
+		 * closer we're to the target, the more reliable value from past
+		 * executions is.
+		 *
+		 * But we don't want to overflow work_mem, so cap by step_max.
+		 */
+		if (node->bs_stats.ntuples_tuplesort_all > 0)
+		{
+			double		rows_per_step;
+
+			/* average number of rows we produced per step so far */
+			rows_per_step = (double) node->bs_stats.ntuples_tuplesort_all / node->bs_stats.watermark_updates_steps;
+
+			/*
+			 * If we have AM stats with average number of rows per step, consider
+			 * that too - approximate depending on what fraction of rows we already
+			 * produced (with higher fraction of rows produced we prefer the local
+			 * average, as opposed to the global average from index AM stats).
+			 */
+			if (plan->rows_per_step > 0)
+			{
+				/* number of rows we already produced (as a fraction) */
+				double weight = (double) node->bs_stats.ntuples_tuplesort_all / plan->step_maxrows;
+
+				/* paranoia */
+				weight = Min(1.0, weight);
+
+				/*
+				 * Approximate between index AM and "local" average calculated
+				 * from past executions. The closer we get to target rows, the
+				 * more we ignore the index AM stats.
+				 */
+				rows_per_step = weight * rows_per_step + (1 - weight) * plan->rows_per_step;
+			}
+
+			/* approximate the steps between */
+			step = Max(step, ceil((double) nrows_remaining / rows_per_step));
+
+			/*
+			 * But don't overflow the current max (which is set either
+			 * as 2x starting value, or from work_mem.
+			 */
+			step = Min(step, step_max);
+		}
+
+		plan->watermark_step = step;
+
+	}
+
+	plan->watermark_step = Max(1, plan->watermark_step);
+	plan->watermark_step = Min(8192, plan->watermark_step);
+}
+
 /* ----------------------------------------------------------------
  *		IndexNext
  *
@@ -997,13 +1176,21 @@ IndexNext(BrinSortState *node)
 
 						/*
 						 * Reset tuplesort statistics between runs, otherwise
-						 * we'll keep re-using stats from the largest run.
+						 * we'll keep re-using stats from the largest run, which
+						 * would then confuse the adaptive adjustment of the
+						 * watermark step.
 						 */
 						tuplesort_reset_stats(node->bs_tuplesortstate);
 
 						tuplesort_performsort(node->bs_tuplesortstate);
 
 						node->bs_stats.sort_count++;
+
+						memset(&stats, 0, sizeof(TuplesortInstrumentation));
+						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+						brinsort_adjust_watermark_step(node, &stats);
+
 						node->bs_stats.ntuples_tuplesort = 0;
 
 						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 01cfe3f5b58..89fb989fa5d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -324,13 +324,8 @@ static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
 											 GatherMergePath *best_path);
 
 
-/*
- * How many distinct minval values to look forward for the next watermark?
- *
- * The smallest step we can do is 1, which means the immediately following
- * (while distinct) minval.
- */
-int brinsort_watermark_step = 0;
+/* defined in nodeBrinSort.c */
+extern int brinsort_watermark_step;
 
 /*
  * create_plan
@@ -3449,8 +3444,14 @@ create_brinsort_plan(PlannerInfo *root,
 	 * are there, and if there are only few then try increasing the step?
 	 */
 	brinsort_plan->watermark_step = brinsort_watermark_step;
+	brinsort_plan->rows_per_step = -1;
 
-	if (brinsort_plan->watermark_step == 0)
+	if (root->limit_tuples > 0)
+		brinsort_plan->step_maxrows = root->limit_tuples;
+	else
+		brinsort_plan->step_maxrows = brinsort_plan->scan.plan.plan_rows;
+
+	if (brinsort_plan->watermark_step <= 0)
 	{
 		BrinMinmaxStats *amstats;
 
@@ -3478,7 +3479,9 @@ create_brinsort_plan(PlannerInfo *root,
 									   amstats->maxval_increment_avg);
 			double	rows_per_step = Max(1.0, pct_per_step * rows);
 
-			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
+			brinsort_plan->rows_per_step = rows_per_step;
+
+			brinsort_plan->watermark_step = (int) ceil(maxrows / rows_per_step);
 		}
 
 		/* some rough safety estimates */
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1bc39b37606..120250867d4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3541,7 +3541,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_NOT_IN_SAMPLE
 		},
 		&brinsort_watermark_step,
-		0, 0, INT_MAX,
+		0, -1, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 06dc6416d99..a3059314054 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1713,7 +1713,7 @@ typedef struct BrinSortState
 	 * We need two tuplesort instances - one for current range, one for
 	 * spill-over tuples from the overlapping ranges
 	 */
-	void		   *bs_tuplesortstate;
+	Tuplesortstate  *bs_tuplesortstate;
 	Tuplestorestate *bs_tuplestore;
 } BrinSortState;
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 659a6d110ee..b0cff1d02d2 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -533,7 +533,8 @@ typedef struct BrinSort
 
 	/* number of watermark steps to make */
 	int			watermark_step;
-
+	int			step_maxrows;
+	int			rows_per_step;
 } BrinSort;
 
 /* ----------------
-- 
2.39.1

0009-wip-add-brinsort-regression-tests-20230218-2.patchtext/x-patch; charset=UTF-8; name=0009-wip-add-brinsort-regression-tests-20230218-2.patchDownload

From e51d29a95031bcf89b02324ff98399f79465899d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 3 Feb 2023 13:19:24 +0100
Subject: [PATCH 09/11] wip: add brinsort regression tests

---
 src/test/regress/expected/brin_sort.out       | 543 ++++++++++++
 src/test/regress/expected/brin_sort_exprs.out | 788 ++++++++++++++++++
 src/test/regress/expected/brin_sort_multi.out | 545 ++++++++++++
 .../expected/brin_sort_multi_exprs.out        | 784 +++++++++++++++++
 src/test/regress/parallel_schedule            |   6 +
 src/test/regress/sql/brin_sort.sql            | 238 ++++++
 src/test/regress/sql/brin_sort_exprs.sql      | 373 +++++++++
 src/test/regress/sql/brin_sort_multi.sql      | 235 ++++++
 .../regress/sql/brin_sort_multi_exprs.sql     | 369 ++++++++
 9 files changed, 3881 insertions(+)
 create mode 100644 src/test/regress/expected/brin_sort.out
 create mode 100644 src/test/regress/expected/brin_sort_exprs.out
 create mode 100644 src/test/regress/expected/brin_sort_multi.out
 create mode 100644 src/test/regress/expected/brin_sort_multi_exprs.out
 create mode 100644 src/test/regress/sql/brin_sort.sql
 create mode 100644 src/test/regress/sql/brin_sort_exprs.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi_exprs.sql

diff --git a/src/test/regress/expected/brin_sort.out b/src/test/regress/expected/brin_sort.out
new file mode 100644
index 00000000000..0e207cf4f6f
--- /dev/null
+++ b/src/test/regress/expected/brin_sort.out
@@ -0,0 +1,543 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_exprs.out b/src/test/regress/expected/brin_sort_exprs.out
new file mode 100644
index 00000000000..87e58a10883
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_exprs.out
@@ -0,0 +1,788 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi.out b/src/test/regress/expected/brin_sort_multi.out
new file mode 100644
index 00000000000..22fa8331d38
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi.out
@@ -0,0 +1,545 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi_exprs.out b/src/test/regress/expected/brin_sort_multi_exprs.out
new file mode 100644
index 00000000000..0e9f4ea3182
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi_exprs.out
@@ -0,0 +1,784 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 15e015b3d64..39af15bb5d8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -131,3 +131,9 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# try sorting using BRIN index
+test: brin_sort
+test: brin_sort_multi
+test: brin_sort_exprs
+test: brin_sort_multi_exprs
diff --git a/src/test/regress/sql/brin_sort.sql b/src/test/regress/sql/brin_sort.sql
new file mode 100644
index 00000000000..f4458bdc386
--- /dev/null
+++ b/src/test/regress/sql/brin_sort.sql
@@ -0,0 +1,238 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_exprs.sql b/src/test/regress/sql/brin_sort_exprs.sql
new file mode 100644
index 00000000000..de1f1e7d8fe
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_exprs.sql
@@ -0,0 +1,373 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi.sql b/src/test/regress/sql/brin_sort_multi.sql
new file mode 100644
index 00000000000..d0544ad7069
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi.sql
@@ -0,0 +1,235 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi_exprs.sql b/src/test/regress/sql/brin_sort_multi_exprs.sql
new file mode 100644
index 00000000000..299c7979326
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi_exprs.sql
@@ -0,0 +1,369 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
-- 
2.39.1

0010-wip-add-brinsort-amstats-regression-tests-20230218-2.patchtext/x-patch; charset=UTF-8; name=0010-wip-add-brinsort-amstats-regression-tests-20230218-2.patchDownload

From d165f5599700c57c17d2e25d504ba5242df1db92 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 18 Feb 2023 16:41:33 +0100
Subject: [PATCH 10/11] wip: add brinsort amstats regression tests

---
 .../regress/expected/brin_sort_amstats.out    | 544 ++++++++++++
 .../expected/brin_sort_exprs_amstats.out      | 789 ++++++++++++++++++
 .../expected/brin_sort_multi_amstats.out      | 546 ++++++++++++
 .../brin_sort_multi_exprs_amstats.out         | 785 +++++++++++++++++
 src/test/regress/parallel_schedule            |   6 +
 src/test/regress/sql/brin_sort_amstats.sql    | 240 ++++++
 .../regress/sql/brin_sort_exprs_amstats.sql   | 375 +++++++++
 .../regress/sql/brin_sort_multi_amstats.sql   | 237 ++++++
 .../sql/brin_sort_multi_exprs_amstats.sql     | 371 ++++++++
 9 files changed, 3893 insertions(+)
 create mode 100644 src/test/regress/expected/brin_sort_amstats.out
 create mode 100644 src/test/regress/expected/brin_sort_exprs_amstats.out
 create mode 100644 src/test/regress/expected/brin_sort_multi_amstats.out
 create mode 100644 src/test/regress/expected/brin_sort_multi_exprs_amstats.out
 create mode 100644 src/test/regress/sql/brin_sort_amstats.sql
 create mode 100644 src/test/regress/sql/brin_sort_exprs_amstats.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi_amstats.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi_exprs_amstats.sql

diff --git a/src/test/regress/expected/brin_sort_amstats.out b/src/test/regress/expected/brin_sort_amstats.out
new file mode 100644
index 00000000000..d1dde8cf7d5
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_amstats.out
@@ -0,0 +1,544 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_exprs_amstats.out b/src/test/regress/expected/brin_sort_exprs_amstats.out
new file mode 100644
index 00000000000..1477379c46b
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_exprs_amstats.out
@@ -0,0 +1,789 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi_amstats.out b/src/test/regress/expected/brin_sort_multi_amstats.out
new file mode 100644
index 00000000000..ccf9fcee481
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi_amstats.out
@@ -0,0 +1,546 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi_exprs_amstats.out b/src/test/regress/expected/brin_sort_multi_exprs_amstats.out
new file mode 100644
index 00000000000..21593bc7e6c
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi_exprs_amstats.out
@@ -0,0 +1,785 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 39af15bb5d8..5f224ad495e 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -137,3 +137,9 @@ test: brin_sort
 test: brin_sort_multi
 test: brin_sort_exprs
 test: brin_sort_multi_exprs
+
+# try sorting using BRIN index with indexam stats
+test: brin_sort_amstats
+test: brin_sort_multi_amstats
+test: brin_sort_exprs_amstats
+test: brin_sort_multi_exprs_amstats
diff --git a/src/test/regress/sql/brin_sort_amstats.sql b/src/test/regress/sql/brin_sort_amstats.sql
new file mode 100644
index 00000000000..8c16c6a3ce9
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_amstats.sql
@@ -0,0 +1,240 @@
+set enable_indexam_stats = true;
+
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_exprs_amstats.sql b/src/test/regress/sql/brin_sort_exprs_amstats.sql
new file mode 100644
index 00000000000..3a22051424f
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_exprs_amstats.sql
@@ -0,0 +1,375 @@
+set enable_indexam_stats = true;
+
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi_amstats.sql b/src/test/regress/sql/brin_sort_multi_amstats.sql
new file mode 100644
index 00000000000..36762f316e4
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi_amstats.sql
@@ -0,0 +1,237 @@
+set enable_indexam_stats = true;
+
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi_exprs_amstats.sql b/src/test/regress/sql/brin_sort_multi_exprs_amstats.sql
new file mode 100644
index 00000000000..2fd887fcc09
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi_exprs_amstats.sql
@@ -0,0 +1,371 @@
+set enable_indexam_stats = true;
+
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
-- 
2.39.1

0011-wip-test-generator-script-20230218-2.patchtext/x-patch; charset=UTF-8; name=0011-wip-test-generator-script-20230218-2.patchDownload

From 3ca5f81264b252985e03b5648da405e689b07258 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 6 Feb 2023 03:42:52 +0100
Subject: [PATCH 11/11] wip: test generator script

---
 brin-test.py | 386 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 386 insertions(+)
 create mode 100644 brin-test.py

diff --git a/brin-test.py b/brin-test.py
new file mode 100644
index 00000000000..c90ec798bea
--- /dev/null
+++ b/brin-test.py
@@ -0,0 +1,386 @@
+import psycopg2
+import psycopg2.extras
+import random
+import sys
+import time
+import re
+
+from datetime import datetime
+from statistics import mean
+
+cols = [('int_val', 'int4_minmax_ops'),
+		('bigint_val', 'int8_minmax_ops'),
+		('text_val', 'text_minmax_ops'),
+		('inet_val', 'inet_minmax_ops'),
+		('(int_val+1)', 'int4_minmax_ops'),
+		('(bigint_val+1)', 'int8_minmax_ops'),
+		("('x' || text_val)", 'text_minmax_ops'),
+		('(inet_val + 1)', 'inet_minmax_ops'),
+		('(int_val+2)', 'int4_minmax_ops'),
+		('(bigint_val+2)', 'int8_minmax_ops'),
+		("('y' || text_val)", 'text_minmax_ops'),
+		('(inet_val + 2)', 'inet_minmax_ops')]
+
+# randomly reorder the table columns
+#table_cols = [('int_val int', 'i', 'i + %(skew)d * random()', 'i + 1000000 * random()'),
+#			  ('bigint_val bigint', '-i', '-i - 100 * random()', '-1 - 1000000 * random()'),
+#			  ('inet_val inet', "'10.0.0.0'::inet + i", "'10.0.0.0'::inet + i * 100 * random()::int", "'10.0.0.0'::inet + i + 1000000 * random()::int"),
+#			  ('text_val text', "lpad(i::text || md5(i::text), 40, '0')", "lpad((i + 100*random()::int)::text || md5(i::text), 40, '0')", "lpad((i + 1000000*random()::int)::text || md5(i::text), 40, '0')")]
+
+table_cols = [('int_val int', 'i + %(randomness)d * random()'),
+			  ('bigint_val bigint', '-i - %(randomness)d * random()'),
+			  ('inet_val inet', "'10.0.0.0'::inet + i + %(randomness)d * random()::int"),
+			  ('text_val text', "lpad((i + %(randomness)d * random()::int)::text || md5(i::text), 40, '0')")]
+
+
+def execute_query(cur, query, fetch_result = False):
+
+	cur.execute(query)
+
+	if fetch_result:
+		return cur.fetchall()
+
+
+# recreate the table with the columns in randomized order
+def recreate_table(conn, nrows, randomness, fillfactor):
+
+	random.shuffle(table_cols)
+
+	cur = conn.cursor()
+
+	execute_query(cur, 'BEGIN')
+
+	execute_query(cur, 'DROP TABLE IF EXISTS test_table')
+
+	execute_query(cur, 'CREATE TABLE test_table (%s) with (fillfactor=%d)' % (', '.join([v[0] for v in table_cols]), fillfactor))
+	print('CREATE TABLE test_table (%s) with (fillfactor=%d)' % (', '.join([v[0] for v in table_cols]), fillfactor))
+
+	insert_sql = 'INSERT INTO test_table SELECT %s FROM generate_series(1,%d) s(i)' % (', '.join([v[1] for v in table_cols]), nrows)
+	insert_sql = insert_sql % {'randomness' : int(nrows * randomness), 'rows' : nrows}
+
+	print(insert_sql)
+
+	execute_query(cur, insert_sql)
+
+	execute_query(cur, 'COMMIT')
+
+	cur.close()
+
+
+def create_indexes(conn, pages_per_range):
+
+	cur = conn.cursor()
+
+	num_indexes = random.randint(1,len(cols))
+
+	# randomly pick columns to index
+	indexed = random.sample(cols, num_indexes)
+
+	for c in indexed:
+		# f = random.random()
+		# num_pages = 1 + int(f * f * f * 256)
+		index_sql = 'CREATE INDEX ON test_table USING brin (%s %s) WITH (pages_per_range=%d)' % (c[0], c[1], pages_per_range)
+		print(index_sql)
+		execute_query(cur, index_sql)
+
+	cur.close()
+
+	return indexed
+
+
+def brinsort_in_explain(cur, query):
+
+	cur.execute('explain ' + query)
+	for r in cur.fetchall():
+		if 'BRIN Sort' in r['QUERY PLAN']:
+			return True
+
+	return False
+
+
+def compare_default(a, b):
+	if a < b:
+		return -1
+	elif a > b:
+		return 1
+	return 0
+
+
+def compare_inet(a, b):
+	a = [int(v) for v in a.split('.')]
+	b = [int(v) for v in b.split('.')]
+
+	for p in range(0,4):
+		r = compare_default(a[p], b[p])
+		if r != 0:
+			return r
+
+	return r
+
+
+def check_ordering(conn, config, query, expected_rows, select_star, select_list, sort_list, is_desc):
+
+	cur = conn.cursor()
+
+	data = execute_query(cur, query, True)
+
+	if len(data) != expected_rows:
+		print('ERROR: unexpected number of rows %s %s' % (expected_rows, len(data)))
+		sys.exit(1)
+
+	# what prefix we can check ordering for (some sort columns may not be
+	# included in the result, and we need a continuous prefix)
+	prefix = []
+	indexes = []
+	sort_order = {}
+	for s in sort_list:
+		if select_star:
+			if s[0] not in [x for x in table_cols]:
+				break
+
+			idx = [x for x in table_cols].index(s[0])
+		else:
+			if s not in select_list:
+				break
+
+			idx = select_list.index(s)
+
+		if idx is None:
+			break
+
+		sort_idx = sort_list.index(s)
+
+		prefix.append(s)
+		indexes.append(idx)
+		# sort_order.update({select_list.index(s) : is_desc[sort_list.index(s)]})
+		sort_order.update({idx : is_desc[sort_idx]})
+
+	# print("PREFIX", indexes, prefix)
+
+	if len(prefix) != 0:
+		prev = None
+		for row in data:
+			if prev is not None:
+
+				for idx in indexes:
+
+					if select_list[idx][1] == 'inet_minmax_ops':
+						r = compare_inet(prev[idx], row[idx])
+					else:
+						r = compare_default(prev[idx], row[idx])
+
+					if sort_order[idx]:
+						r = -r
+
+					if r > 0:
+						print("ERROR: incorrect ordering %s > %s" % (prev[idx], row[idx]))
+						sys.exit(1)
+
+					if r < 0:
+						break
+
+			prev = row
+
+	cur.close()
+
+
+def run_queries(conn, config, indexed_cols, num_queries = 1000):
+
+	nquery = 0
+
+	while nquery < num_queries:
+		if run_query(conn, config, indexed_cols):
+			nquery += 1
+
+
+def query_timing(cur, query):
+
+	runs = []
+
+	# get explain plan and costs from the first node
+	r = execute_query(cur, 'explain (analyze, timing off) %s' % (query,), True)
+	print("")
+	print("\n".join(['    ' + x[0] for x in r]))
+	print("")
+
+	sys.stdout.flush()
+
+	r = re.search('cost=([^\s]*)\.\.([^\s]*)', r[0][0])
+	costs = [float(r.groups()[0]), float(r.groups()[1])]
+
+	for r in range(0,1):
+		s = time.time()
+		execute_query(cur, query)
+		d = time.time()
+		runs.append(d-s)
+
+	# print("runs %s => mean %s" % (str(runs), mean(runs)))
+
+	sys.stdout.flush()
+
+	return (mean(runs), costs)
+
+
+def check_timing(conn, config, query):
+
+	cur = conn.cursor()
+
+	# get timing for a simple plan without a BRIN sort
+
+	execute_query(cur, 'set enable_seqscan = on')
+	execute_query(cur, 'set enable_brinsort = off')
+
+	(seqscan_time, seqscan_costs) = query_timing(cur, query)
+
+	# get timing for a simple plan with a BRIN sort
+	execute_query(cur, 'set enable_seqscan = off')
+	execute_query(cur, 'set enable_brinsort = on')
+	execute_query(cur, query)
+
+	(brinsort_time, brinsort_costs) = query_timing(cur, query)
+
+	print ("timing", 'rows', config['nrows'], 'pages_per_range', config['pages_per_range'], 'randomness', config['randomness'], 'fillfactor', config['fillfactor'], 'work_mem', config['work_mem'], 'watermark_step', config['watermark_step'], 'limit', config['limit'], 'offset', config['offset'], "seqscan", seqscan_time, "brinsort", brinsort_time, "costs seqscan", seqscan_costs[0], seqscan_costs[1], "brinsort", brinsort_costs[0], brinsort_costs[1])
+	# print ("brinsort timing", brinsort_time, "costs", brinsort_costs[0], brinsort_costs[1])
+
+	if (seqscan_costs[1] * 1.1 < brinsort_costs[1]) and (seqscan_time > brinsort_time * 1.1):
+		print ("COSTING ISSUE (%f < %f) && (%f > %f)" % (seqscan_costs[1], brinsort_costs[1], seqscan_time, brinsort_time))
+
+	if (seqscan_costs[1] > brinsort_costs[1] * 1.1) and (seqscan_time * 1.1 < brinsort_time):
+		print ("COSTING ISSUE (%f > %f) && (%f < %f)" % (seqscan_costs[1], brinsort_costs[1], seqscan_time, brinsort_time))
+
+	sys.stdout.flush()
+
+
+def run_query(conn, config, indexed_cols):
+
+	limit_rows = config['nrows']
+	offset_rows = 0
+
+	cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
+
+	# random columns to reference in the SELECT list, may not include sort column(s)
+	select_list = random.sample(cols, random.randint(1,len(cols)))
+
+	# but maybe just do select *, so that we don't do a projection
+	select_star = False
+	if random.random() < 0.5:
+		select_star = True
+		select_list = [('*', None)]
+
+	# random columns to reference in the ORDER BY clause
+	sort_list = random.sample(cols, random.randint(1,len(cols)))
+
+	# generate random ASC / DESC modifiers
+	is_desc = []
+	order_by = []
+	for s in range(0,len(sort_list)):
+		desc = random.choice([True, False])
+		is_desc.append(desc)
+		x = sort_list[s][0]
+		if desc:
+			x = x + ' DESC'
+		order_by.append(x)
+
+	query = 'SELECT %s FROM test_table ORDER BY %s' % (', '.join([v[0] for v in select_list]), ', '.join(order_by))
+
+	# randomly add LIMIT and OFFSET clause(s)
+	if random.random() < 0.5:
+
+		limit_rows = 1 + int(pow(random.random(), 3) * random.randint(1,config['nrows']))
+		query = query + ' LIMIT %d' % (limit_rows,);
+
+		if limit_rows < config['nrows'] and random.random() < 0.5:
+
+			offset_rows = int(pow(random.random(), 3) * random.randint(1,config['nrows'] - limit_rows))
+			query = query + ' OFFSET %d' % (offset_rows,);
+
+	expected_rows = min(limit_rows, config['nrows'] - offset_rows)
+
+	# watermark_step = random.randint(-1, 3)
+	watermark_step = random.choice([-1, 0, 1, 8, 32, 128])
+	execute_query(cur, 'SET brinsort_watermark_step = %d' % (watermark_step,))
+
+	f = random.random()
+	#work_mem_kb = 64 + int((f * f * f) * random.randint(64, 32768))
+	work_mem_kb = random.choice([64, 1024, 4096, 32768])
+
+	execute_query(cur, "SET work_mem = '%dkB'" % (work_mem_kb,))
+
+	config = config.copy()
+	config.update({'work_mem' : work_mem_kb})
+	config.update({'watermark_step' : watermark_step})
+	config.update({'limit' : limit_rows})
+	config.update({'offset' : offset_rows})
+
+	# do we expect brinsort or not? only when the first ORDER BY is indexed
+	if sort_list[0] in indexed_cols:
+
+		print('--------------', datetime.now(), '--------------')
+		print("SQL:", query)
+		print("CONFIG:", config)
+
+		if brinsort_in_explain(cur, query):
+			check_ordering(conn, config, query, expected_rows, select_star, select_list, sort_list, is_desc)
+			check_timing(conn, config, query)
+		else:
+			print("ERROR: BRIN Sort not in plan")
+			sys.exit(1)
+
+		result = True
+
+	else:
+
+		if brinsort_in_explain(cur, query):
+			print("ERROR: BRIN Sort in plan")
+			sys.exit(1)
+
+		result = False
+
+	cur.close()
+
+	return result
+
+
+def setup_connection(conn):
+	cur = conn.cursor()
+
+	# force index access
+	execute_query(cur, 'SET enable_seqscan = off')
+	execute_query(cur, 'SET max_parallel_workers_per_gather = 0')
+
+	cur.close()
+
+
+run_id = 0
+
+while True:
+
+	run_id += 1
+
+	config = {}
+
+	conn = psycopg2.connect('host=localhost port=5432 dbname=test user=user')
+
+	setup_connection(conn)
+
+	print('========== run %d ==========' % (run_id,))
+
+	# data distribution (1 - sequential, 3 - random)
+	config['randomness'] = random.choice([0, 0.05, 0.1, 0.25, 0.5, 1.0])
+
+	# random fillfactor, skewed closer to 10%
+	config['fillfactor'] = 10 + int(pow(random.random(),3) * 90)
+
+	# random number of rows
+	config['nrows'] = random.choice([100000, 1000000])
+
+	# pages per BRIN range (for all indexes)
+	config['pages_per_range'] = random.choice([1, 32, 128])
+
+	recreate_table(conn, config['nrows'], config['randomness'], config['fillfactor'])
+
+	indexed_cols = create_indexes(conn, config['pages_per_range'])
+
+	run_queries(conn, config, indexed_cols)
+
+	conn.close()
-- 
2.39.1

#29

Justin Pryzby

pryzby@telsasoft.com

almost 3 years ago

In reply to: Tomas Vondra (#28)

Re: PATCH: Using BRIN indexes for sorted output

Are (any of) these patches targetting v16 ?

typos:
ar we - we are?
morestly - mostly
interstect - intersect

+ * XXX We don't sort the bins, so just do binary sort. For large number of values
+ * this might be an issue, for small number of values a linear search is fine.

"binary sort" is wrong?

+ * only half of there ranges, thus 1/2. This can be extended to randomly

half of *these* ranges ?

From 7b3307c27b35ece119feab4891f03749250e454b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 17 Oct 2022 18:39:28 +0200
Subject: [PATCH 01/11] Allow index AMs to build and use custom statistics

I think the idea can also apply to btree - currently, correlation is
considered to be a property of a column, but not an index. But that
fails to distinguish between a freshly built index, and an index with
out of order heap references, which can cause an index scan to be a lot
more expensive.

I implemented per-index correlation stats way back when:
/messages/by-id/20160524173914.GA11880@telsasoft.com

With my old test case:

Index scan is 3x slower than bitmap scan, but index scan is costed as
being cheaper:

postgres=# explain analyze SELECT * FROM t WHERE i>11 AND i<55;
Index Scan using t_i_idx on t (cost=0.43..21153.74 rows=130912 width=8) (actual time=0.107..222.737 rows=128914 loops=1)

postgres=# SET enable_indexscan =no;
postgres=# explain analyze SELECT * FROM t WHERE i>11 AND i<55;
Bitmap Heap Scan on t (cost=2834.28..26895.96 rows=130912 width=8) (actual time=16.830..69.860 rows=128914 loops=1)

If it's clustered, then the index scan is almost twice as fast, and the
costs are more consistent with the associated time. The planner assumes
that the indexes are freshly built...

postgres=# CLUSTER t USING t_i_idx ;
postgres=# explain analyze SELECT * FROM t WHERE i>11 AND i<55;
Index Scan using t_i_idx on t (cost=0.43..20121.74 rows=130912 width=8) (actual time=0.084..117.549 rows=128914 loops=1)

--
Justin

#30

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Justin Pryzby (#29)

Re: PATCH: Using BRIN indexes for sorted output

On 2/18/23 19:51, Justin Pryzby wrote:

Are (any of) these patches targetting v16 ?

Probably not. Maybe if there's more feedback / scrutiny, but I'm not
sure one commitfest is enough to polish the patch (especially
considering I haven't done much on the costing yet).

typos:
ar we - we are?
morestly - mostly
interstect - intersect
+ * XXX We don't sort the bins, so just do binary sort. For large number of values
+ * this might be an issue, for small number of values a linear search is fine.
"binary sort" is wrong?

+ * only half of there ranges, thus 1/2. This can be extended to randomly

half of *these* ranges ?

Thanks, I'll fix those.

From 7b3307c27b35ece119feab4891f03749250e454b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 17 Oct 2022 18:39:28 +0200
Subject: [PATCH 01/11] Allow index AMs to build and use custom statistics

I think the idea can also apply to btree - currently, correlation is
considered to be a property of a column, but not an index. But that
fails to distinguish between a freshly built index, and an index with
out of order heap references, which can cause an index scan to be a lot
more expensive.

I implemented per-index correlation stats way back when:
/messages/by-id/20160524173914.GA11880@telsasoft.com

See also:
/messages/by-id/14438.1512499811@sss.pgh.pa.us

With my old test case:

Index scan is 3x slower than bitmap scan, but index scan is costed as
being cheaper:

postgres=# explain analyze SELECT * FROM t WHERE i>11 AND i<55;
Index Scan using t_i_idx on t (cost=0.43..21153.74 rows=130912 width=8) (actual time=0.107..222.737 rows=128914 loops=1)

postgres=# SET enable_indexscan =no;
postgres=# explain analyze SELECT * FROM t WHERE i>11 AND i<55;
Bitmap Heap Scan on t (cost=2834.28..26895.96 rows=130912 width=8) (actual time=16.830..69.860 rows=128914 loops=1)

If it's clustered, then the index scan is almost twice as fast, and the
costs are more consistent with the associated time. The planner assumes
that the indexes are freshly built...

postgres=# CLUSTER t USING t_i_idx ;
postgres=# explain analyze SELECT * FROM t WHERE i>11 AND i<55;
Index Scan using t_i_idx on t (cost=0.43..20121.74 rows=130912 width=8) (actual time=0.084..117.549 rows=128914 loops=1)

Yeah, the concept of indexam statistics certainly applies to other index
types, and for btree we might collect information about correlation etc.
I haven't looked at the 2017 patch, but it seems reasonable.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#31

Matthias van de Meent

boekewurm+postgres@gmail.com

almost 3 years ago

In reply to: Tomas Vondra (#28)

Re: PATCH: Using BRIN indexes for sorted output

On Sat, 18 Feb 2023 at 16:55, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

cfbot complained there's one more place triggering a compiler warning on
32-bit systems, so here's a version fixing that.

I've also added a copy of the regression tests but using the indexam
stats added in 0001. This is just a copy of the already existing
regression tests, just with enable_indexam_stats=true - this should
catch some of the issues that went mostly undetected in the earlier
patch versions.

Comments on 0001, mostly comments and patch design:

+range_minval_cmp(const void *a, const void *b, void *arg)
[...]
+range_maxval_cmp(const void *a, const void *b, void *arg)
[...]
+range_values_cmp(const void *a, const void *b, void *arg)

Can the arguments of these functions be modified into the types they
are expected to receive? If not, could you add a comment on why that's
not possible?
I don't think it's good practise to "just" use void* for arguments to
cast them to more concrete types in the next lines without an
immediate explanation.

+ * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).

Could you please expand on this? We do have GIST support for ranges, too.

+ * brin_minmax_stats
+ *        Calculate custom statistics for a BRIN minmax index.
+ *
+ * At the moment this calculates:
+ *
+ *  - number of summarized/not-summarized and all/has nulls ranges

I think statistics gathering of an index should be done at the AM
level, not attribute level. The docs currently suggest that the user
builds one BRIN index with 16 columns instead of 16 BRIN indexes with
one column each, which would make the statistics gathering use 16x
more IO if the scanned data cannot be reused.

It is possible to build BRIN indexes on more than one column with more
than one opclass family like `USING brin (id int8_minmax_ops, id
int8_bloom_ops)`. This would mean various duplicate statistics fields,
no?
It seems to me that it's more useful to do the null- and n_summarized
on the index level instead of duplicating that inside the opclass.

+ for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)

I am not familiar with the frequency of max-sized relations, but this
would go into an infinite loop for pagesPerRange values >1 for
max-sized relations due to BlockNumber wraparound. I think there
should be some additional overflow checks here.

+/*
+ * get_attstaindexam
+ *
+ *      Given the table and attribute number of a column, get the index AM
+ *      statistics.  Return NULL if no data available.
+ *

Shouldn't this be "given the index and attribute number" instead of
"the table and attribute number"?
I think we need to be compatible with indexes on expression here, so
that we don't fail to create (or use) statistics for an index `USING
brin ( (int8range(my_min_column, my_max_column, '[]'))
range_inclusion_ops)` when we implement stats for range_inclusion_ops.

+ * Alternative to brin_minmax_match_tuples_to_ranges2, leveraging ordering

Does this function still exist?

I'm planning on reviewing the other patches, and noticed that a lot of
the patches are marked WIP. Could you share a status on those, because
currently that status is unknown: Are these patches you don't plan on
including, or are these patches only (or mostly) included for
debugging?

Kind regards,

Matthias van de Meent

#32

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Matthias van de Meent (#31)

Re: PATCH: Using BRIN indexes for sorted output

On 2/23/23 15:19, Matthias van de Meent wrote:

On Sat, 18 Feb 2023 at 16:55, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

cfbot complained there's one more place triggering a compiler warning on
32-bit systems, so here's a version fixing that.

I've also added a copy of the regression tests but using the indexam
stats added in 0001. This is just a copy of the already existing
regression tests, just with enable_indexam_stats=true - this should
catch some of the issues that went mostly undetected in the earlier
patch versions.

Comments on 0001, mostly comments and patch design:
+range_minval_cmp(const void *a, const void *b, void *arg)
[...]
+range_maxval_cmp(const void *a, const void *b, void *arg)
[...]
+range_values_cmp(const void *a, const void *b, void *arg)
Can the arguments of these functions be modified into the types they
are expected to receive? If not, could you add a comment on why that's
not possible?
I don't think it's good practise to "just" use void* for arguments to
cast them to more concrete types in the next lines without an
immediate explanation.

The reason is that that's what qsort() expects. If you change that to
actual data types, you'll get compile-time warnings. I agree this may
need better comments, though.

+ * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).

Could you please expand on this? We do have GIST support for ranges, too.

Expand in what way? This is meant to be AM-specific, so if GiST wants to
collect some additional stats, it's free to do so - perhaps some of the
ideas from the stats collected for BRIN would be applicable, but it's
also bound to the index structure.

+ * brin_minmax_stats
+ *        Calculate custom statistics for a BRIN minmax index.
+ *
+ * At the moment this calculates:
+ *
+ *  - number of summarized/not-summarized and all/has nulls ranges
I think statistics gathering of an index should be done at the AM
level, not attribute level. The docs currently suggest that the user
builds one BRIN index with 16 columns instead of 16 BRIN indexes with
one column each, which would make the statistics gathering use 16x
more IO if the scanned data cannot be reused.

Why? The row sample is collected only once and used for building all the
index AM stats - it doesn't really matter if we analyze 16 single-column
indexes or 1 index with 16 columns. Yes, we'll need to scan more
indexes, but the with 16 columns the summaries will be larger so the
total amount of I/O will be almost the same I think.

Or maybe I don't understand what I/O you're talking about?

It is possible to build BRIN indexes on more than one column with more
than one opclass family like `USING brin (id int8_minmax_ops, id
int8_bloom_ops)`. This would mean various duplicate statistics fields,
no?
It seems to me that it's more useful to do the null- and n_summarized
on the index level instead of duplicating that inside the opclass.

I don't think it's worth it. The amount of data this would save is tiny,
and it'd only apply to cases where the index includes the same attribute
multiple times, and that's pretty rare I think. I don't think it's worth
the extra complexity.

+ for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)

I am not familiar with the frequency of max-sized relations, but this
would go into an infinite loop for pagesPerRange values >1 for
max-sized relations due to BlockNumber wraparound. I think there
should be some additional overflow checks here.

Good point, but that's a pre-existing issue. We do this same loop in a
number of places.

+/*
+ * get_attstaindexam
+ *
+ *      Given the table and attribute number of a column, get the index AM
+ *      statistics.  Return NULL if no data available.
+ *
Shouldn't this be "given the index and attribute number" instead of
"the table and attribute number"?
I think we need to be compatible with indexes on expression here, so
that we don't fail to create (or use) statistics for an index `USING
brin ( (int8range(my_min_column, my_max_column, '[]'))
range_inclusion_ops)` when we implement stats for range_inclusion_ops.

+ * Alternative to brin_minmax_match_tuples_to_ranges2, leveraging ordering

Does this function still exist?

Yes, but only in 0003 - it's a "brute-force" algorithm used as a
cross-check the result of the optimized algorithm in 0001. You're right
it should not be referenced in the comment.

I'm planning on reviewing the other patches, and noticed that a lot of
the patches are marked WIP. Could you share a status on those, because
currently that status is unknown: Are these patches you don't plan on
including, or are these patches only (or mostly) included for
debugging?

I think the WIP label is a bit misleading, I used it mostly to mark
patches that are not meant to be committed on their own. A quick overview:

0002-wip-introduce-debug_brin_stats-20230218-2.patch
0003-wip-introduce-debug_brin_cross_check-20230218-2.patch

Not meant for commit, used mostly during development/debugging, by
adding debug logging and/or cross-check to validate the results.

I think it's fine to ignore those during review.

0005-wip-brinsort-explain-stats-20230218-2.patch

This needs more work. It does what it's meant to do (show info about
the brinsort node), but I'm not very happy with the formatting etc.

Review and suggestions welcome.

0006-wip-multiple-watermark-steps-20230218-2.patch
0007-wip-adjust-watermark-step-20230218-2.patch
0008-wip-adaptive-watermark-step-20230218-2.patch

Ultimately this should be merged into 0004 (which does the actual
brinsort), I think 0006 is the way to go, but I kept all the patches
to show the incremental evolution (and allow comparisons).

0006 adds a GUC to specify how many ranges to add in each round,
0005 adjusts that based on statistics during planning and 0006 does
that adaptively during execution.

0009-wip-add-brinsort-regression-tests-20230218-2.patch
0010-wip-add-brinsort-amstats-regression-tests-20230218-2.patch

These need to move to the earlier parts. The tests are rather
expensive so we'll need to reduce it somehow.

0011-wip-test-generator-script-20230218-2.patch

Not intended for commit, I only included it to keep it as part of the
patch series.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#33

Matthias van de Meent

boekewurm+postgres@gmail.com

almost 3 years ago

In reply to: Tomas Vondra (#32)

Re: PATCH: Using BRIN indexes for sorted output

On Thu, 23 Feb 2023 at 16:22, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/23/23 15:19, Matthias van de Meent wrote:

Comments on 0001, mostly comments and patch design:

One more comment:

+ range->min_value = bval->bv_values[0];
+ range->max_value = bval->bv_values[1];

This produces dangling pointers for minmax indexes on by-ref types
such as text and bytea, due to the memory context of the decoding
tuple being reset every iteration. :-(

+range_values_cmp(const void *a, const void *b, void *arg)

Can the arguments of these functions be modified into the types they
are expected to receive? If not, could you add a comment on why that's
not possible?

The reason is that that's what qsort() expects. If you change that to
actual data types, you'll get compile-time warnings. I agree this may
need better comments, though.

Thanks in advance.

+ * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).

Could you please expand on this? We do have GIST support for ranges, too.

Expand in what way? This is meant to be AM-specific, so if GiST wants to
collect some additional stats, it's free to do so - perhaps some of the
ideas from the stats collected for BRIN would be applicable, but it's
also bound to the index structure.

I don't quite understand the flow of the comment, as I don't clearly
see what the "BRIN for ranges" tries to refer to. In my view, that
makes it a bad example which needs further explanation or rewriting,
aka "expanding on".

+ * brin_minmax_stats
+ *        Calculate custom statistics for a BRIN minmax index.
+ *
+ * At the moment this calculates:
+ *
+ *  - number of summarized/not-summarized and all/has nulls ranges
I think statistics gathering of an index should be done at the AM
level, not attribute level. The docs currently suggest that the user
builds one BRIN index with 16 columns instead of 16 BRIN indexes with
one column each, which would make the statistics gathering use 16x
more IO if the scanned data cannot be reused.
Why? The row sample is collected only once and used for building all the
index AM stats - it doesn't really matter if we analyze 16 single-column
indexes or 1 index with 16 columns. Yes, we'll need to scan more
indexes, but the with 16 columns the summaries will be larger so the
total amount of I/O will be almost the same I think.

Or maybe I don't understand what I/O you're talking about?

With the proposed patch, we do O(ncols_statsenabled) scans over the
BRIN index. Each scan reads all ncol columns of all block ranges from
disk, so in effect the data scan does on the order of
O(ncols_statsenabled * ncols * nranges) IOs, or O(n^2) on cols when
all columns have statistics enabled.

It is possible to build BRIN indexes on more than one column with more
than one opclass family like `USING brin (id int8_minmax_ops, id
int8_bloom_ops)`. This would mean various duplicate statistics fields,
no?
It seems to me that it's more useful to do the null- and n_summarized
on the index level instead of duplicating that inside the opclass.

I don't think it's worth it. The amount of data this would save is tiny,
and it'd only apply to cases where the index includes the same attribute
multiple times, and that's pretty rare I think. I don't think it's worth
the extra complexity.

Not necessarily, it was just an example of where we'd save IO.
Note that the current gathering method already retrieves all tuple
attribute data, so from a basic processing perspective we'd save some
time decoding as well.

I'm planning on reviewing the other patches, and noticed that a lot of
the patches are marked WIP. Could you share a status on those, because
currently that status is unknown: Are these patches you don't plan on
including, or are these patches only (or mostly) included for
debugging?

I think the WIP label is a bit misleading, I used it mostly to mark
patches that are not meant to be committed on their own. A quick overview:

[...]

Thanks for the explanation, that's quite helpful. I'll see to further
reviewing 0004 and 0005 when I have additional time.

Kind regards,

Matthias van de Meent.

#34

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Matthias van de Meent (#33)

Re: PATCH: Using BRIN indexes for sorted output

On 2/23/23 17:44, Matthias van de Meent wrote:

On Thu, 23 Feb 2023 at 16:22, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/23/23 15:19, Matthias van de Meent wrote:

Comments on 0001, mostly comments and patch design:

One more comment:

+ range->min_value = bval->bv_values[0];
+ range->max_value = bval->bv_values[1];

This produces dangling pointers for minmax indexes on by-ref types
such as text and bytea, due to the memory context of the decoding
tuple being reset every iteration. :-(

Yeah, that sounds like a bug. Also a sign the tests should have some
by-ref data types (presumably there are none, as that would certainly
trip some asserts etc.).

+range_values_cmp(const void *a, const void *b, void *arg)

Can the arguments of these functions be modified into the types they
are expected to receive? If not, could you add a comment on why that's
not possible?

The reason is that that's what qsort() expects. If you change that to
actual data types, you'll get compile-time warnings. I agree this may
need better comments, though.

Thanks in advance.

+ * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).

Could you please expand on this? We do have GIST support for ranges, too.

Expand in what way? This is meant to be AM-specific, so if GiST wants to
collect some additional stats, it's free to do so - perhaps some of the
ideas from the stats collected for BRIN would be applicable, but it's
also bound to the index structure.

I don't quite understand the flow of the comment, as I don't clearly
see what the "BRIN for ranges" tries to refer to. In my view, that
makes it a bad example which needs further explanation or rewriting,
aka "expanding on".

Ah, right. Yeah, the "BRIN for ranges" wording is a bit misleading. It
should really say only BRIN, but I was focused on the minmax use case,
so I mentioned the ranges.

+ * brin_minmax_stats
+ *        Calculate custom statistics for a BRIN minmax index.
+ *
+ * At the moment this calculates:
+ *
+ *  - number of summarized/not-summarized and all/has nulls ranges
I think statistics gathering of an index should be done at the AM
level, not attribute level. The docs currently suggest that the user
builds one BRIN index with 16 columns instead of 16 BRIN indexes with
one column each, which would make the statistics gathering use 16x
more IO if the scanned data cannot be reused.
Why? The row sample is collected only once and used for building all the
index AM stats - it doesn't really matter if we analyze 16 single-column
indexes or 1 index with 16 columns. Yes, we'll need to scan more
indexes, but the with 16 columns the summaries will be larger so the
total amount of I/O will be almost the same I think.

Or maybe I don't understand what I/O you're talking about?
With the proposed patch, we do O(ncols_statsenabled) scans over the
BRIN index. Each scan reads all ncol columns of all block ranges from
disk, so in effect the data scan does on the order of
O(ncols_statsenabled * ncols * nranges) IOs, or O(n^2) on cols when
all columns have statistics enabled.

I don't think that's the number of I/O operations we'll do, because we
always read the whole BRIN tuple at once. So I believe it should rather
be something like

O(ncols_statsenabled * nranges)

assuming nranges is the number of page ranges. But even that's likely a
significant overestimate because most of the tuples will be served from
shared buffers.

Considering how tiny BRIN indexes are, this is likely orders of
magnitude less I/O than we expend on sampling rows from the table. I
mean, with the default statistics target we read ~30000 pages (~240MB)
or more just to sample the rows. Randomly, while the BRIN index is
likely scanned mostly sequentially.

Maybe there are cases where this would be an issue, but I haven't seen
one when working on this patch (and I did a lot of experiments). I'd
like to see one before we start optimizing it ...

This also reminds me that the issues I actually saw (e.g. memory
consumption) would be made worse by processing all columns at once,
because then you need to keep more columns in memory.

It is possible to build BRIN indexes on more than one column with more
than one opclass family like `USING brin (id int8_minmax_ops, id
int8_bloom_ops)`. This would mean various duplicate statistics fields,
no?
It seems to me that it's more useful to do the null- and n_summarized
on the index level instead of duplicating that inside the opclass.

I don't think it's worth it. The amount of data this would save is tiny,
and it'd only apply to cases where the index includes the same attribute
multiple times, and that's pretty rare I think. I don't think it's worth
the extra complexity.

Not necessarily, it was just an example of where we'd save IO.
Note that the current gathering method already retrieves all tuple
attribute data, so from a basic processing perspective we'd save some
time decoding as well.

[shrug] I still think it's a negligible fraction of the time.

I'm planning on reviewing the other patches, and noticed that a lot of
the patches are marked WIP. Could you share a status on those, because
currently that status is unknown: Are these patches you don't plan on
including, or are these patches only (or mostly) included for
debugging?

I think the WIP label is a bit misleading, I used it mostly to mark
patches that are not meant to be committed on their own. A quick overview:

[...]

Thanks for the explanation, that's quite helpful. I'll see to further
reviewing 0004 and 0005 when I have additional time.

Cool, thank you!

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#35

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 3 years ago

In reply to: Matthias van de Meent (#31)

Re: PATCH: Using BRIN indexes for sorted output

On 2023-Feb-23, Matthias van de Meent wrote:

+ for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)

I am not familiar with the frequency of max-sized relations, but this
would go into an infinite loop for pagesPerRange values >1 for
max-sized relations due to BlockNumber wraparound. I think there
should be some additional overflow checks here.

They are definitely not very common -- BlockNumber wraps around at 32 TB
IIUC. But yeah, I guess it is a possibility, and perhaps we should find
a way to write these loops in a more robust manner.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"World domination is proceeding according to plan" (Andrew Morton)

#36

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Alvaro Herrera (#35)

Re: PATCH: Using BRIN indexes for sorted output

On 2/24/23 09:39, Alvaro Herrera wrote:

On 2023-Feb-23, Matthias van de Meent wrote:

+ for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)

I am not familiar with the frequency of max-sized relations, but this
would go into an infinite loop for pagesPerRange values >1 for
max-sized relations due to BlockNumber wraparound. I think there
should be some additional overflow checks here.

They are definitely not very common -- BlockNumber wraps around at 32 TB
IIUC. But yeah, I guess it is a possibility, and perhaps we should find
a way to write these loops in a more robust manner.

I guess the easiest fix would be to do the arithmetic in 64 bits. That'd
eliminate the overflow.

Alternatively, we could do something like

prevHeapBlk = 0;
for (heapBlk = 0; (heapBlk < nblocks) && (prevHeapBlk <= heapBlk);
heapBlk += pagesPerRange)
{
...
prevHeapBlk = heapBlk;
}

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#37

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 3 years ago

In reply to: Tomas Vondra (#36)

Re: PATCH: Using BRIN indexes for sorted output

On 2023-Feb-24, Tomas Vondra wrote:

I guess the easiest fix would be to do the arithmetic in 64 bits. That'd
eliminate the overflow.

Yeah, that might be easy to set up. We then don't have to worry about
it until BlockNumber is enlarged to 64 bits ... but by that time surely
we can just grow it again to a 128 bit loop variable.

Alternatively, we could do something like

prevHeapBlk = 0;
for (heapBlk = 0; (heapBlk < nblocks) && (prevHeapBlk <= heapBlk);
heapBlk += pagesPerRange)
{
...
prevHeapBlk = heapBlk;
}

I think a formulation of this kind has the benefit that it works after
BlockNumber is enlarged to 64 bits, and doesn't have to be changed ever
again (assuming it is correct).

... if pagesPerRange is not a whole divisor of MaxBlockNumber, I think
this will neglect the last range in the table.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#38

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Alvaro Herrera (#37)

Re: PATCH: Using BRIN indexes for sorted output

On 2/24/23 16:14, Alvaro Herrera wrote:

On 2023-Feb-24, Tomas Vondra wrote:

I guess the easiest fix would be to do the arithmetic in 64 bits. That'd
eliminate the overflow.

Yeah, that might be easy to set up. We then don't have to worry about
it until BlockNumber is enlarged to 64 bits ... but by that time surely
we can just grow it again to a 128 bit loop variable.

Alternatively, we could do something like

prevHeapBlk = 0;
for (heapBlk = 0; (heapBlk < nblocks) && (prevHeapBlk <= heapBlk);
heapBlk += pagesPerRange)
{
...
prevHeapBlk = heapBlk;
}

I think a formulation of this kind has the benefit that it works after
BlockNumber is enlarged to 64 bits, and doesn't have to be changed ever
again (assuming it is correct).

Did anyone even propose doing that? I suspect this is unlikely to be the
only place that'd might be broken by that.

... if pagesPerRange is not a whole divisor of MaxBlockNumber, I think
this will neglect the last range in the table.

Why would it? Let's say BlockNumber is uint8, i.e. 255 max. And there
are 10 pages per range. That's 25 "full" ranges, and the last range
being just 5 pages. So we get into

prevHeapBlk = 240
heapBlk = 250

and we read the last 5 pages. And then we update

prevHeapBlk = 250
heapBlk = (250 + 10) % 255 = 5

and we don't do that loop. Or did I get this wrong, somehow?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#39

Matthias van de Meent

boekewurm+postgres@gmail.com

almost 3 years ago

In reply to: Tomas Vondra (#38)

Re: PATCH: Using BRIN indexes for sorted output

On Fri, 24 Feb 2023 at 17:04, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/24/23 16:14, Alvaro Herrera wrote:

... if pagesPerRange is not a whole divisor of MaxBlockNumber, I think
this will neglect the last range in the table.

Why would it? Let's say BlockNumber is uint8, i.e. 255 max. And there
are 10 pages per range. That's 25 "full" ranges, and the last range
being just 5 pages. So we get into

prevHeapBlk = 240
heapBlk = 250

and we read the last 5 pages. And then we update

prevHeapBlk = 250
heapBlk = (250 + 10) % 255 = 5

and we don't do that loop. Or did I get this wrong, somehow?

The result is off-by-one due to (u)int8 overflows being mod-256, but
apart from that your result is accurate.

The condition only stops the loop when we wrap around or when we go
past the last block, but no earlier than that.

Kind regards,

Matthias van de Meent

#40

Matthias van de Meent

boekewurm+postgres@gmail.com

almost 3 years ago

In reply to: Tomas Vondra (#34)

Re: PATCH: Using BRIN indexes for sorted output

On Thu, 23 Feb 2023 at 19:48, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/23/23 17:44, Matthias van de Meent wrote:

On Thu, 23 Feb 2023 at 16:22, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/23/23 15:19, Matthias van de Meent wrote:

Comments on 0001, mostly comments and patch design:

One more comment:

+ range->min_value = bval->bv_values[0];
+ range->max_value = bval->bv_values[1];

This produces dangling pointers for minmax indexes on by-ref types
such as text and bytea, due to the memory context of the decoding
tuple being reset every iteration. :-(

Yeah, that sounds like a bug. Also a sign the tests should have some
by-ref data types (presumably there are none, as that would certainly
trip some asserts etc.).

I'm not sure we currently trip asserts, as the data we store in the
memory context is very limited, making it unlikely we actually release
the memory region back to the OS.
I did get assertion failures by adding the attached patch, but I don't
think that's something we can do in the final release.

With the proposed patch, we do O(ncols_statsenabled) scans over the
BRIN index. Each scan reads all ncol columns of all block ranges from
disk, so in effect the data scan does on the order of
O(ncols_statsenabled * ncols * nranges) IOs, or O(n^2) on cols when
all columns have statistics enabled.

I don't think that's the number of I/O operations we'll do, because we
always read the whole BRIN tuple at once. So I believe it should rather
be something like

O(ncols_statsenabled * nranges)

assuming nranges is the number of page ranges. But even that's likely a
significant overestimate because most of the tuples will be served from
shared buffers.

We store some data per index attribute, which makes the IO required
for a single indexed range proportional to the number of index
attributes.
If we then scan the index a number of times proportional to the number
of attributes, the cumulative IO load of statistics gathering for that
index is quadratic on the number of attributes, not linear, right?

Considering how tiny BRIN indexes are, this is likely orders of
magnitude less I/O than we expend on sampling rows from the table. I
mean, with the default statistics target we read ~30000 pages (~240MB)
or more just to sample the rows. Randomly, while the BRIN index is
likely scanned mostly sequentially.

Mostly agreed; except I think it's not too difficult to imagine a BRIN
index that is on that scale; with as an example the bloom filters that
easily take up several hundreds of bytes.

With the default configuration of 128 pages_per_range,
n_distinct_per_range of -0.1, and false_positive_rate of 0.01, the
bloom filter size is 4.36 KiB - each indexed item on its own page. It
is still only 1% of the original table's size, but there are enough
tables that are larger than 24GB that this could be a significant
issue.

Note that most of my concerns are related to our current
documentation's statement that there are no demerits to multi-column
BRIN indexes:

"""
11.3. Multicolumn Indexes

[...] The only reason to have multiple BRIN indexes instead of one
multicolumn BRIN index on a single table is to have a different
pages_per_range storage parameter.
"""

Wide BRIN indexes add IO overhead for single-attribute scans when
compared to single-attribute indexes, so having N single-attribute
index scans to build statistics the statistics on an N-attribute index
is not great.

Maybe there are cases where this would be an issue, but I haven't seen
one when working on this patch (and I did a lot of experiments). I'd
like to see one before we start optimizing it ...

I'm not only worried about optimizing it, I'm also worried that we're
putting this abstraction at the wrong level in a way that is difficult
to modify.

This also reminds me that the issues I actually saw (e.g. memory
consumption) would be made worse by processing all columns at once,
because then you need to keep more columns in memory.

Yes, I that can be a valid concern, but don't we already do similar
things in the current table statistics gathering?

Kind regards,

Matthias van de Meent

#41

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Matthias van de Meent (#40)

Re: PATCH: Using BRIN indexes for sorted output

On 2/24/23 19:03, Matthias van de Meent wrote:

On Thu, 23 Feb 2023 at 19:48, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/23/23 17:44, Matthias van de Meent wrote:

On Thu, 23 Feb 2023 at 16:22, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/23/23 15:19, Matthias van de Meent wrote:

Comments on 0001, mostly comments and patch design:

One more comment:

+ range->min_value = bval->bv_values[0];
+ range->max_value = bval->bv_values[1];

This produces dangling pointers for minmax indexes on by-ref types
such as text and bytea, due to the memory context of the decoding
tuple being reset every iteration. :-(

Yeah, that sounds like a bug. Also a sign the tests should have some
by-ref data types (presumably there are none, as that would certainly
trip some asserts etc.).

I'm not sure we currently trip asserts, as the data we store in the
memory context is very limited, making it unlikely we actually release
the memory region back to the OS.
I did get assertion failures by adding the attached patch, but I don't
think that's something we can do in the final release.

But we should randomize the memory if we ever do pfree(), and it's
strange valgrind didn't complain when I ran tests with it. But yeah,
I'll take a look and see if we can add some tests covering this.

With the proposed patch, we do O(ncols_statsenabled) scans over the
BRIN index. Each scan reads all ncol columns of all block ranges from
disk, so in effect the data scan does on the order of
O(ncols_statsenabled * ncols * nranges) IOs, or O(n^2) on cols when
all columns have statistics enabled.

I don't think that's the number of I/O operations we'll do, because we
always read the whole BRIN tuple at once. So I believe it should rather
be something like

O(ncols_statsenabled * nranges)

assuming nranges is the number of page ranges. But even that's likely a
significant overestimate because most of the tuples will be served from
shared buffers.

We store some data per index attribute, which makes the IO required
for a single indexed range proportional to the number of index
attributes.
If we then scan the index a number of times proportional to the number
of attributes, the cumulative IO load of statistics gathering for that
index is quadratic on the number of attributes, not linear, right?

Ah, OK. I was focusing on number of I/O operations while you're talking
about amount of I/O performed. You're right the amount of I/O is
quadratic, but I think what's more interesting is the comparison of the
alternative ANALYZE approaches. The current simple approach does a
multiple of the I/O amount, linear to the number of attributes.

Which is not great, ofc.

Considering how tiny BRIN indexes are, this is likely orders of
magnitude less I/O than we expend on sampling rows from the table. I
mean, with the default statistics target we read ~30000 pages (~240MB)
or more just to sample the rows. Randomly, while the BRIN index is
likely scanned mostly sequentially.

Mostly agreed; except I think it's not too difficult to imagine a BRIN
index that is on that scale; with as an example the bloom filters that
easily take up several hundreds of bytes.

With the default configuration of 128 pages_per_range,
n_distinct_per_range of -0.1, and false_positive_rate of 0.01, the
bloom filter size is 4.36 KiB - each indexed item on its own page. It
is still only 1% of the original table's size, but there are enough
tables that are larger than 24GB that this could be a significant
issue.

Right, it's certainly true BRIN indexes may be made fairly large (either
by using something like bloom or by making the ranges much smaller). But
those are exactly the indexes where building statistics for all columns
at once is going to cause issues with memory usage etc.

Note: Obviously, that depends on how much data per range we need to keep
in memory. For bloom I doubt we'd actually want to keep all the filters,
we'd probably calculate just some statistics (e.g. number of bits set),
so maybe the memory consumption is not that bad.

In fact, for such indexes the memory consumption may already be an issue
even when analyzing the index column by column. My feeling is we'll need
to do something about that, e.g. by reading only a sample of the ranges,
or something like that. That might help with the I/O too, I guess.

Note that most of my concerns are related to our current
documentation's statement that there are no demerits to multi-column
BRIN indexes:

"""
11.3. Multicolumn Indexes

[...] The only reason to have multiple BRIN indexes instead of one
multicolumn BRIN index on a single table is to have a different
pages_per_range storage parameter.
"""

True, we may need to clarify this in the documentation.

Wide BRIN indexes add IO overhead for single-attribute scans when
compared to single-attribute indexes, so having N single-attribute
index scans to build statistics the statistics on an N-attribute index
is not great.

Perhaps, but it's not like the alternative is perfect either
(complexity, memory consumption, ...). IMHO it's a reasonable tradeoff.

Maybe there are cases where this would be an issue, but I haven't seen
one when working on this patch (and I did a lot of experiments). I'd
like to see one before we start optimizing it ...

I'm not only worried about optimizing it, I'm also worried that we're
putting this abstraction at the wrong level in a way that is difficult
to modify.

Yeah, that's certainly a valid concern. I admit I only did the minimum
amount of work on this part, as I was focused on the sorting part.

This also reminds me that the issues I actually saw (e.g. memory
consumption) would be made worse by processing all columns at once,
because then you need to keep more columns in memory.

Yes, I that can be a valid concern, but don't we already do similar
things in the current table statistics gathering?

Not really, I think. We sample a bunch of rows once, but then we build
statistics on this sample for each attribute / expression independently.
We could of course read the whole index into memory and then run the
analysis, but I think tuples tend to be much smaller (thanks to TOAST
etc.) and we only really scan a limited amount of them (30k).

But if we're concerned about the I/O, the BRIN is likely fairly large,
so maybe reading it into memory at once is not a great idea.

It's be more similar if we sampled sampled the BRIN ranges - I think
we'll have to do something like that anyway.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#42

Tomas Vondra

tomas.vondra@enterprisedb.com

almost 3 years ago

In reply to: Tomas Vondra (#41)

11 attachment(s)

Re: PATCH: Using BRIN indexes for sorted output

On 2/24/23 20:14, Tomas Vondra wrote:

On 2/24/23 19:03, Matthias van de Meent wrote:

On Thu, 23 Feb 2023 at 19:48, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/23/23 17:44, Matthias van de Meent wrote:

On Thu, 23 Feb 2023 at 16:22, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 2/23/23 15:19, Matthias van de Meent wrote:

Comments on 0001, mostly comments and patch design:

One more comment:

+ range->min_value = bval->bv_values[0];
+ range->max_value = bval->bv_values[1];

This produces dangling pointers for minmax indexes on by-ref types
such as text and bytea, due to the memory context of the decoding
tuple being reset every iteration. :-(

Yeah, that sounds like a bug. Also a sign the tests should have some
by-ref data types (presumably there are none, as that would certainly
trip some asserts etc.).

I'm not sure we currently trip asserts, as the data we store in the
memory context is very limited, making it unlikely we actually release
the memory region back to the OS.
I did get assertion failures by adding the attached patch, but I don't
think that's something we can do in the final release.

But we should randomize the memory if we ever do pfree(), and it's
strange valgrind didn't complain when I ran tests with it. But yeah,
I'll take a look and see if we can add some tests covering this.

There was no patch to trigger the assertions, but the attached patches
should fix that, I think. It pretty much just does datumCopy() after
reading the BRIN tuple from disk.

It's interesting I've been unable to hit the usual asserts checking
freed memory etc. even with text columns etc. I guess the BRIN tuple
memory happens to be aligned in a way that just happens to work.

It howver triggered an assert failure in minval_end, because it didn't
use the proper comparator.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Allow-index-AMs-to-build-and-use-custom-sta-20230225.patchtext/x-patch; charset=UTF-8; name=0001-Allow-index-AMs-to-build-and-use-custom-sta-20230225.patchDownload

From 400eadd3d1527d44966222e965aa4abdc5bd7d2b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 17 Oct 2022 18:39:28 +0200
Subject: [PATCH 01/11] Allow index AMs to build and use custom statistics

Some indexing AMs work very differently and estimating them using
existing statistics is problematic, producing unreliable costing. This
applies e.g. to BRIN, which relies on page ranges, not tuple pointers.

This adds an optional AM procedure, allowing the opfamily to build
custom statistics, store them in pg_statistic and then use them during
planning. By default this is disabled, but may be enabled by setting

   SET enable_indexam_stats = true;

Then ANALYZE will call the optional procedure for all indexes.
---
 src/backend/access/brin/brin.c                |   1 +
 src/backend/access/brin/brin_minmax.c         | 902 ++++++++++++++++++
 src/backend/commands/analyze.c                | 149 ++-
 src/backend/statistics/extended_stats.c       |   2 +
 src/backend/utils/adt/selfuncs.c              |  59 ++
 src/backend/utils/cache/lsyscache.c           |  41 +
 src/backend/utils/misc/guc_tables.c           |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h                    |   2 +
 src/include/access/brin.h                     |  63 ++
 src/include/access/brin_internal.h            |   1 +
 src/include/catalog/pg_amproc.dat             |  64 ++
 src/include/catalog/pg_proc.dat               |   4 +
 src/include/catalog/pg_statistic.h            |   5 +
 src/include/commands/vacuum.h                 |   2 +
 src/include/utils/lsyscache.h                 |   1 +
 src/test/regress/expected/sysviews.out        |   3 +-
 17 files changed, 1304 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b5a5fa7b334..88a0361015c 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -95,6 +95,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->amstrategies = 0;
 	amroutine->amsupport = BRIN_LAST_OPTIONAL_PROCNUM;
 	amroutine->amoptsprocnum = BRIN_PROCNUM_OPTIONS;
+	amroutine->amstatsprocnum = BRIN_PROCNUM_STATISTICS;
 	amroutine->amcanorder = false;
 	amroutine->amcanorderbyop = false;
 	amroutine->amcanbackward = false;
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 2431591be65..14fff22247a 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -10,17 +10,23 @@
  */
 #include "postgres.h"
 
+#include "access/brin.h"
 #include "access/brin_internal.h"
+#include "access/brin_revmap.h"
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
+#include "executor/executor.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 typedef struct MinmaxOpaque
 {
@@ -253,6 +259,902 @@ brin_minmax_union(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/* FIXME copy of a private struct from brin.c */
+typedef struct BrinOpaque
+{
+	BlockNumber bo_pagesPerRange;
+	BrinRevmap *bo_rmAccess;
+	BrinDesc   *bo_bdesc;
+} BrinOpaque;
+
+/*
+ * Compare ranges by minval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_minval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->min_value, rb->min_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/*
+ * Compare ranges by maxval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_maxval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->max_value, rb->max_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/* compare values using an operator from typcache */
+static int
+range_values_cmp(const void *a, const void *b, void *arg)
+{
+	Datum	da = * (Datum *) a;
+	Datum	db = * (Datum *) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  da, db);
+	return DatumGetInt32(c);
+}
+
+/*
+ * minval_end
+ *		Determine first index so that (minval > value).
+ *
+ * The array of ranges is expected to be sorted by minvalue, so this is the first
+ * range that can't possibly intersect with a range having "value" as maxval.
+ */
+static int
+minval_end(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[end]->min_value, typcache) >= 0)
+		return nranges;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[start]->min_value, typcache) < 0)
+		return 0;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->min_value, typcache);
+
+		if (r >= 0)
+			start = midpoint + 1;
+		else
+			end = midpoint;
+	}
+
+	Assert(range_values_cmp(&ranges[start]->min_value, &value, typcache) > 0);
+	Assert(range_values_cmp(&ranges[start-1]->min_value, &value, typcache) <= 0);
+
+	return start;
+}
+
+
+/*
+ * lower_bound
+ *		Determine first index so that (values[index] >= value).
+ *
+ * The array of values is sorted, and this returns the first value that
+ * exceeds (or is equal) to the minvalue.
+ */
+static int
+lower_bound(Datum *values, int nvalues, Datum minvalue, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	/* all values exceed minvalue - return the first element */
+	if (range_values_cmp(&minvalue, &values[start], typcache) <= 0)
+		return 0;
+
+	/* nothing matches - return the element after the last one */
+	if (range_values_cmp(&minvalue, &values[end], typcache) > 0)
+		return nvalues;
+
+	/*
+	 * Now we know the lower boundary is somewhere in the array (and we know
+	 * it's not the first element, because that's covered by the first check
+	 * above). So do a binary search.
+	 */
+	while ((end - start) > 0)
+	{
+		int	midpoint;
+		int	r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&minvalue, &values[midpoint], typcache);
+
+		if (r <= 0)	/* minvalue >= midpoint */
+			end = midpoint;
+		else		/* midpoint < minvalue */
+			start = (midpoint + 1);
+	}
+
+	Assert(range_values_cmp(&minvalue, &values[start], typcache) <= 0);
+	Assert(range_values_cmp(&minvalue, &values[start-1], typcache) > 0);
+
+	return start;
+}
+
+/*
+ * upper_bound
+ *		Determine last index so that (values[index] <= maxvalue).
+ *
+ * The array of values is sorted, and this returns the last value that
+ * does not exceed (or is equal) to the maxvalue.
+ */
+static int
+upper_bound(Datum *values, int nvalues, Datum maxvalue, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	/* everything matches, return the last element */
+	if (range_values_cmp(&values[end], &maxvalue, typcache) <= 0)
+		return (nvalues - 1);
+
+	/* nothing matches, return the element before the first one */
+	if (range_values_cmp(&values[start], &maxvalue, typcache) > 0)
+		return -1;
+
+	/*
+	 * Now we know the lower boundary is somewhere in the array (and we know
+	 * it's not the last element, because that's covered by the first check
+	 * above). So do a binary search.
+	 */
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		/* Ensure we always move (it might be equal to start due to rounding). */
+		midpoint = Max(start+1, midpoint);
+
+		r = range_values_cmp(&values[midpoint], &maxvalue, typcache);
+
+		if (r <= 0)			/* value <= maxvalue */
+			start = midpoint;
+		else				/* value > maxvalue */
+			end = midpoint - 1;
+	}
+
+	Assert(range_values_cmp(&values[start], &maxvalue, typcache) <= 0);
+	Assert(range_values_cmp(&values[start+1], &maxvalue, typcache) > 0);
+
+	return start;
+}
+
+/*
+ * brin_minmax_count_overlaps
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges to quickly eliminate ranges that can't possibly
+ * intersect. We simply walk minranges until minval > current maxval, and
+ * we're done.
+ *
+ * Unlike brin_minmax_count_overlaps2, this does not have issues with wide
+ * ranges, so this is what we should use.
+ */
+static void
+brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
+						   TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	int64	noverlaps;
+
+	noverlaps = 0;
+	for (int i = 0; i < nranges; i++)
+	{
+		Datum	maxval = minranges[i]->max_value;
+
+		/*
+		 * Determine index of the first range with (minval > current maxval)
+		 * by binary search. We know all other ranges can't overlap the
+		 * current one. We simply subtract indexes to count ranges.
+		 */
+		int		idx = minval_end(minranges, nranges, maxval, typcache);
+
+		/* -1 because we don't count the range as intersecting with itself */
+		noverlaps += (idx - i - 1);
+	}
+
+	/*
+	 * We only count 1/2 the ranges (minval > current minval), so the total
+	 * number of overlaps is twice what we counted.
+	 */
+	noverlaps *= 2;
+
+	stats->avg_overlaps = (double) noverlaps / nranges;
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Alternative to brin_minmax_match_tuples_to_ranges2, leveraging ordering
+ * of values, not ranges.
+ *
+ * XXX This seems like the optimal way to do this.
+ */
+static void
+brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
+								   int numrows, HeapTuple *rows,
+								   int nvalues, Datum *values,
+								   TypeCacheEntry *typcache,
+								   BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nvalues_unique = 0;
+
+	int64  *unique = (int64 *) palloc0(sizeof(int64) * nvalues);
+
+	/*
+	 * Build running count of unique values. We know there are unique[i]
+	 * unique values in values array up to index "i".
+	 */
+	unique[0] = 1;
+	for (int i = 1; i < nvalues; i++)
+	{
+		if (range_values_cmp(&values[i-1], &values[i], typcache) == 0)
+			unique[i] = unique[i-1];
+		else
+			unique[i] = unique[i-1] + 1;
+	}
+
+	nvalues_unique = unique[nvalues-1];
+
+	/*
+	 * Walk the ranges, for each range determine the first/last mapping
+	 * value. Use the "unique" array to count the unique values.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int		start,
+				end,
+				nvalues_match,
+				nunique_match;
+
+		CHECK_FOR_INTERRUPTS();
+
+		start = lower_bound(values, nvalues, ranges->ranges[i].min_value, typcache);
+		end = upper_bound(values, nvalues, ranges->ranges[i].max_value, typcache);
+
+		/* if nothing matches (e.g. end=0), skip this range */
+		if (end <= start)
+			continue;
+
+		nvalues_match = (end - start + 1);
+		nunique_match = (unique[end] - unique[start] + 1);
+
+		Assert((nvalues_match >= 1) && (nvalues_match <= nvalues));
+		Assert((nunique_match >= 1) && (nunique_match <= unique[nvalues-1]));
+
+		nmatches += nvalues_match;
+		nmatches_unique += nunique_match;
+	}
+
+	Assert(nmatches >= 0);
+	Assert(nmatches_unique >= 0);
+
+	stats->avg_matches = (double) nmatches / numrows;
+	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
+}
+
+/*
+ * brin_minmax_value_stats
+ *		Calculate statistics about minval/maxval values.
+ *
+ * We calculate the number of distinct values, and also correlation with respect
+ * to blkno_start. We don't calculate the regular correlation coefficient, because
+ * our goal is to estimate how sequential the accesses are. The regular correlation
+ * would produce 0 for cyclical data sets like mod(i,1000000), but it may be quite
+ * sequantial access. Maybe it should be called differently, not correlation?
+ *
+ * XXX Maybe this should calculate minval vs. maxval correlation too?
+ *
+ * XXX I don't know how important the sequentiality is - BRIN generally uses 1MB
+ * page ranges, which is pretty sequential and the one random seek in between is
+ * likely going to be negligible. Maybe for small page ranges it'll matter, though.
+ */
+static void
+brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
+						int nranges, TypeCacheEntry *typcache,
+						BrinMinmaxStats *stats)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1,
+			minval_corr = 0,
+			maxval_corr = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+			minval_ndist++;
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+			maxval_ndist++;
+
+		/* is it immediately sequential? */
+		if (minranges[i-1]->blkno_end + 1 == minranges[i]->blkno_start)
+			minval_corr++;
+
+		/* is it immediately sequential? */
+		if (maxranges[i-1]->blkno_end + 1 == maxranges[i]->blkno_start)
+			maxval_corr++;
+	}
+
+	stats->minval_ndistinct = minval_ndist;
+	stats->maxval_ndistinct = maxval_ndist;
+
+	stats->minval_correlation = (double) minval_corr / nranges;
+	stats->maxval_correlation = (double) maxval_corr / nranges;
+}
+
+/*
+ * brin_minmax_increment_stats
+ *		Calculate the increment size for minval/maxval steps.
+ *
+ * Calculates the minval/maxval increment size, i.e. number of rows that need
+ * to be added to the sort. This serves as an input to calculation of a good
+ * watermark step.
+ */
+static void
+brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
+							int nranges, Datum *values, int nvalues,
+							TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1;
+
+	double	sum_minval = 0,
+			sum_maxval = 0,
+			max_minval = 0,
+			max_maxval = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+		{
+			double	p;
+			int		start = upper_bound(values, nvalues, minranges[i-1]->min_value, typcache);
+			int		end = upper_bound(values, nvalues, minranges[i]->min_value, typcache);
+
+			/*
+			 * Maybe there are no matching rows, but we still need to count
+			 * this as distinct minval (even though the sample increase is 0).
+			 */
+			minval_ndist++;
+
+			Assert(end >= start);
+
+			/* no sample rows match this, so skip */
+			if (end == start)
+				continue;
+
+			p = (double) (end - start) / nvalues;
+
+			max_minval = Max(max_minval, p);
+			sum_minval += p;
+		}
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+		{
+			double	p;
+			int		start = upper_bound(values, nvalues, maxranges[i-1]->max_value, typcache);
+			int		end = upper_bound(values, nvalues, maxranges[i]->max_value, typcache);
+
+			/*
+			 * Maybe there are no matching rows, but we still need to count
+			 * this as distinct maxval (even though the sample increase is 0).
+			 */
+			maxval_ndist++;
+
+			Assert(end >= start);
+
+			/* no sample rows match this, so skip */
+			if (end == start)
+				continue;
+
+			p = (double) (end - start) / nvalues;
+
+			max_maxval = Max(max_maxval, p);
+			sum_maxval += p;
+		}
+	}
+
+	stats->minval_increment_avg = (sum_minval / minval_ndist);
+	stats->minval_increment_max = max_minval;
+
+	stats->maxval_increment_avg = (sum_maxval / maxval_ndist);
+	stats->maxval_increment_max = max_maxval;
+}
+
+/*
+ * brin_minmax_stats
+ *		Calculate custom statistics for a BRIN minmax index.
+ *
+ * At the moment this calculates:
+ *
+ *  - number of summarized/not-summarized and all/has nulls ranges
+ *  - average number of overlaps for a range
+ *  - average number of rows matching a range
+ *  - number of distinct minval/maxval values
+ *
+ * XXX This could also calculate correlation of the range minval, so that
+ * we can estimate how much random I/O will happen during the BrinSort.
+ * And perhaps we should also sort the ranges by (minval,block_start) to
+ * make this as sequential as possible?
+ *
+ * XXX Another interesting statistics might be the number of ranges with
+ * the same minval (or number of distinct minval values), because that's
+ * essentially what we need to estimate how many ranges will be read in
+ * one brinsort step. In fact, knowing the number of distinct minval
+ * values tells us the number of BrinSort loops.
+ *
+ * XXX We might also calculate a histogram of minval/maxval values.
+ *
+ * XXX I wonder if we could track for each range track probabilities:
+ *
+ * - P1 = P(v <= minval)
+ * - P2 = P(x <= Max(maxval)) for Max(maxval) over preceding ranges
+ *
+ * That would allow us to estimate how many ranges we'll have to read to produce
+ * a particular number of rows, because we need the first probability to exceed
+ * the requested number of rows (fraction of the table):
+ *
+ *     (limit rows / reltuples) <= P(v <= minval)
+ *
+ * and then the second probability would say how many rows we'll process (either
+ * sort or spill). And inversely for the DESC ordering.
+ *
+ * The difference between P1 for two ranges is how much we'd have to sort
+ * if we moved the watermark between the ranges (first minval to second one).
+ * The (P2 - P1) for the new watermark range measures the number of rows in
+ * the tuplestore. We'll need to aggregate this, though, we can't keep the
+ * whole data - probably average/median/max for the differences would be nice.
+ * Might be tricky for different watermark step values, though.
+ *
+ * This would also allow estimating how many rows will spill from each range,
+ * because we have an estimate how many rows match a range on average, and
+ * we can compare it to the difference between P1.
+ *
+ * One issue is we don't have actual tuples from the ranges, so we can't
+ * measure exactly how many rows would we add. But we can match the sample
+ * and at least estimate the the probability difference.
+ *
+ * Actually - we do know the tuples *are* in those ranges, because if we
+ * assume the tuple is in some other range, that range would have to have
+ * a minimal/maximal value so that the value is consistent. Which means
+ * the range has to be between those ranges. Of course, this only estimates
+ * the rows we'd going to add to the tuplesort - there might be more rows
+ * we read and spill to tuplestore, but that's something we can estimate
+ * using average tuples per range.
+ */
+Datum
+brin_minmax_stats(PG_FUNCTION_ARGS)
+{
+	Relation		heapRel = (Relation) PG_GETARG_POINTER(0);
+	Relation		indexRel = (Relation) PG_GETARG_POINTER(1);
+	AttrNumber		attnum = PG_GETARG_INT16(2);	/* index attnum */
+	AttrNumber		heap_attnum = PG_GETARG_INT16(3);
+	Expr		   *expr = (Expr *) PG_GETARG_POINTER(4);
+	HeapTuple	   *rows = (HeapTuple *) PG_GETARG_POINTER(5);
+	int				numrows = PG_GETARG_INT32(6);
+
+	BrinOpaque *opaque;
+	BlockNumber nblocks;
+	BlockNumber	nranges;
+	BlockNumber	heapBlk;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BrinRanges  *ranges;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinMinmaxStats *stats;
+	Form_pg_attribute attr;
+
+	Oid				typoid;
+	TypeCacheEntry *typcache;
+	BrinRange	  **minranges,
+				  **maxranges;
+	int64			prev_min_index;
+
+	/* expression stats */
+	EState	   *estate;
+	ExprContext *econtext;
+	ExprState  *exprstate;
+	TupleTableSlot *slot;
+
+	/* attnum or expression has to be supplied */
+	Assert(AttributeNumberIsValid(heap_attnum) || (expr != NULL));
+
+	/* but not both of them at the same time */
+	Assert(!(AttributeNumberIsValid(heap_attnum) && (expr != NULL)));
+
+	/*
+	 * Mostly what brinbeginscan does to initialize BrinOpaque, except that
+	 * we use active snapshot instead of the scan snapshot.
+	 */
+	opaque = palloc_object(BrinOpaque);
+	opaque->bo_rmAccess = brinRevmapInitialize(indexRel,
+											   &opaque->bo_pagesPerRange,
+											   GetActiveSnapshot());
+	opaque->bo_bdesc = brin_build_desc(indexRel);
+
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/* attribute information */
+	attr = TupleDescAttr(bdesc->bd_tupdesc, attnum - 1);
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+
+	/*
+	 * How many ranges can there be? We simply look at the number of pages,
+	 * divide it by the pages_per_range.
+	 *
+	 * XXX We need to be careful not to overflow nranges, so we just divide
+	 * and then maybe add 1 for partial ranges.
+	 */
+	nranges = (nblocks / pagesPerRange);
+	if (nblocks % pagesPerRange != 0)
+		nranges += 1;
+
+	/* allocate for space, and also for the alternative ordering */
+	ranges = palloc0(offsetof(BrinRanges, ranges) + nranges * sizeof(BrinRange));
+	ranges->nranges = 0;
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* result stats */
+	stats = palloc0(sizeof(BrinMinmaxStats));
+	SET_VARSIZE(stats, sizeof(BrinMinmaxStats));
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX We count the ranges, and count the special types (not summarized,
+	 * all-null and has-null). The regular ranges are accumulated into an
+	 * array, so that we can calculate additional statistics (overlaps, hits
+	 * for sample tuples, etc).
+	 *
+	 * XXX This needs rethinking to make it work with large indexes with more
+	 * ranges than we can fit into memory (work_mem/maintenance_work_mem).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		stats->n_ranges++;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   GetActiveSnapshot());
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/* Ranges with no indexed tuple are ignored for overlap analysis. */
+		if (!gottuple)
+		{
+			continue;
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/* Placeholders can be ignored too, as if not summarized. */
+				continue;
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				/* OK this range is summarized */
+				stats->n_summarized++;
+
+				if (bval->bv_allnulls)
+					stats->n_all_nulls++;
+
+				if (bval->bv_hasnulls)
+					stats->n_has_nulls++;
+
+				if (!bval->bv_allnulls)
+				{
+					BrinRange  *range;
+
+					range = &ranges->ranges[ranges->nranges++];
+
+					range->blkno_start = heapBlk;
+					range->blkno_end = heapBlk + (pagesPerRange - 1);
+
+					range->min_value = datumCopy(bval->bv_values[0],
+												 attr->attbyval, attr->attlen);
+					range->max_value = datumCopy(bval->bv_values[1],
+												 attr->attbyval, attr->attlen);
+				}
+			}
+		}
+	}
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* if we have no regular ranges, we're done */
+	if (ranges->nranges == 0)
+		goto cleanup;
+
+	/*
+	 * Build auxiliary info to optimize the calculation.
+	 *
+	 * We have ranges in the blocknum order, but that is not very useful when
+	 * calculating which ranges interstect - we could cross-check every range
+	 * against every other range, but that's O(N^2) and thus may get extremely
+	 * expensive pretty quick).
+	 *
+	 * To make that cheaper, we'll build two orderings, allowing us to quickly
+	 * eliminate ranges that can't possibly overlap:
+	 *
+	 * - minranges = ranges ordered by min_value
+	 * - maxranges = ranges ordered by max_value
+	 *
+	 * To count intersections, we'll then walk maxranges (i.e. ranges ordered
+	 * by maxval), and for each following range we'll check if it overlaps.
+	 * If yes, we'll proceed to the next one, until we find a range that does
+	 * not overlap. But there might be a later page overlapping - but we can
+	 * use a min_index_lowest tracking the minimum min_index for "future"
+	 * ranges to quickly decide if there are such ranges. If there are none,
+	 * we can terminate (and proceed to the next maxranges element), else we
+	 * have to process additional ranges.
+	 *
+	 * Note: This only counts overlaps with ranges with max_value higher than
+	 * the current one - we want to count all, but the overlaps with preceding
+	 * ranges have already been counted when processing those preceding ranges.
+	 * That is, we'll end up with counting each overlap just for one of those
+	 * ranges, so we get only 1/2 the count.
+	 *
+	 * Note: We don't count the range as overlapping with itself. This needs
+	 * to be considered later, when applying the statistics.
+	 *
+	 *
+	 * XXX This will not work for very many ranges - we can have up to 2^32 of
+	 * them, so allocating a ~32B struct for each would need a lot of memory.
+	 * Not sure what to do about that, perhaps we could sample a couple ranges
+	 * and do some calculations based on that? That is, we could process all
+	 * ranges up to some number (say, statistics_target * 300, as for rows), and
+	 * then sample ranges for larger tables. Then sort the sampled ranges, and
+	 * walk through all ranges once, comparing them to the sample and counting
+	 * overlaps (having them sorted should allow making this quite efficient,
+	 * I think - following algorithm similar to the one implemented here).
+	 */
+
+	/* info about ordering for the data type */
+	typoid = get_atttype(RelationGetRelid(indexRel), attnum);
+	typcache = lookup_type_cache(typoid, TYPECACHE_CMP_PROC_FINFO);
+
+	/* shouldn't happen, I think - we use this to build the index */
+	Assert(OidIsValid(typcache->cmp_proc_finfo.fn_oid));
+
+	minranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+	maxranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+
+	/*
+	 * Build and sort the ranges min_value / max_value (just pointers
+	 * to the main array). Then go and assign the min_index to each
+	 * range, and finally walk the maxranges array backwards and track
+	 * the min_index_lowest as minimum of "future" indexes.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		minranges[i] = &ranges->ranges[i];
+		maxranges[i] = &ranges->ranges[i];
+	}
+
+	qsort_arg(minranges, ranges->nranges, sizeof(BrinRange *),
+			  range_minval_cmp, typcache);
+
+	qsort_arg(maxranges, ranges->nranges, sizeof(BrinRange *),
+			  range_maxval_cmp, typcache);
+
+	/*
+	 * Update the min_index for each range. If the values are equal, be sure to
+	 * pick the lowest index with that min_value.
+	 */
+	minranges[0]->min_index = 0;
+	for (int i = 1; i < ranges->nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i]->min_value, &minranges[i-1]->min_value, typcache) == 0)
+			minranges[i]->min_index = minranges[i-1]->min_index;
+		else
+			minranges[i]->min_index = i;
+	}
+
+	/*
+	 * Walk the maxranges backward and assign the min_index_lowest as
+	 * a running minimum.
+	 */
+	prev_min_index = ranges->nranges;
+	for (int i = (ranges->nranges - 1); i >= 0; i--)
+	{
+		maxranges[i]->min_index_lowest = Min(maxranges[i]->min_index,
+											 prev_min_index);
+		prev_min_index = maxranges[i]->min_index_lowest;
+	}
+
+	/* calculate average number of overlapping ranges for any range */
+	brin_minmax_count_overlaps(minranges, ranges->nranges, typcache, stats);
+
+	/* calculate minval/maxval stats (distinct values and correlation) */
+	brin_minmax_value_stats(minranges, maxranges,
+							ranges->nranges, typcache, stats);
+
+	/*
+	 * If processing expression, prepare context to evaluate it.
+	 *
+	 * XXX cleanup / refactoring needed
+	 */
+	if (expr)
+	{
+		estate = CreateExecutorState();
+		econtext = GetPerTupleExprContext(estate);
+
+		/* Need a slot to hold the current heap tuple, too */
+		slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRel),
+										&TTSOpsHeapTuple);
+
+		/* Arrange for econtext's scan tuple to be the tuple under test */
+		econtext->ecxt_scantuple = slot;
+
+		exprstate = ExecPrepareExpr(expr, estate);
+	}
+
+	/* match tuples to ranges */
+	{
+		int		nvalues = 0;
+		Datum  *values = (Datum *) palloc0(numrows * sizeof(Datum));
+
+		TupleDesc	tdesc = RelationGetDescr(heapRel);
+
+		for (int i = 0; i < numrows; i++)
+		{
+			bool	isnull;
+			Datum	value;
+
+			if (!expr)
+				value = heap_getattr(rows[i], heap_attnum, tdesc, &isnull);
+			else
+			{
+				/*
+				 * Reset the per-tuple context each time, to reclaim any cruft
+				 * left behind by evaluating the predicate or index expressions.
+				 */
+				ResetExprContext(econtext);
+
+				/* Set up for predicate or expression evaluation */
+				ExecStoreHeapTuple(rows[i], slot, false);
+
+				value = ExecEvalExpr(exprstate,
+									 GetPerTupleExprContext(estate),
+									 &isnull);
+			}
+
+			if (!isnull)
+				values[nvalues++] = value;
+		}
+
+		qsort_arg(values, nvalues, sizeof(Datum), range_values_cmp, typcache);
+
+		/* optimized algorithm */
+		brin_minmax_match_tuples_to_ranges(ranges,
+										   numrows, rows, nvalues, values,
+										   typcache, stats);
+
+		brin_minmax_increment_stats(minranges, maxranges, ranges->nranges,
+									values, nvalues, typcache, stats);
+	}
+
+	/* XXX cleanup / refactoring needed */
+	if (expr)
+	{
+		ExecDropSingleTupleTableSlot(slot);
+		FreeExecutorState(estate);
+	}
+
+	/*
+	 * Possibly quite large, so release explicitly and don't rely
+	 * on the memory context to discard this.
+	 */
+	pfree(minranges);
+	pfree(maxranges);
+
+cleanup:
+	/* possibly quite large, so release explicitly */
+	pfree(ranges);
+
+	/* free the BrinOpaque, just like brinendscan() would */
+	brinRevmapTerminate(opaque->bo_rmAccess);
+	brin_free_desc(opaque->bo_bdesc);
+
+	PG_RETURN_POINTER(stats);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 65750958bb2..984a7f85cda 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -16,6 +16,7 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
 #include "access/detoast.h"
 #include "access/genam.h"
 #include "access/multixact.h"
@@ -30,6 +31,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_namespace.h"
@@ -81,6 +83,7 @@ typedef struct AnlIndexData
 
 /* Default statistics target (GUC parameter) */
 int			default_statistics_target = 100;
+bool		enable_indexam_stats = false;
 
 /* A few variables that don't seem worth passing around as parameters */
 static MemoryContext anl_context = NULL;
@@ -92,7 +95,7 @@ static void do_analyze_rel(Relation onerel,
 						   AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
 						   bool inh, bool in_outer_xact, int elevel);
 static void compute_index_stats(Relation onerel, double totalrows,
-								AnlIndexData *indexdata, int nindexes,
+								AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 								HeapTuple *rows, int numrows,
 								MemoryContext col_context);
 static VacAttrStats *examine_attribute(Relation onerel, int attnum,
@@ -453,15 +456,49 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		{
 			AnlIndexData *thisdata = &indexdata[ind];
 			IndexInfo  *indexInfo;
+			bool		collectAmStats;
+			Oid			regproc;
 
 			thisdata->indexInfo = indexInfo = BuildIndexInfo(Irel[ind]);
 			thisdata->tupleFract = 1.0; /* fix later if partial */
-			if (indexInfo->ii_Expressions != NIL && va_cols == NIL)
+
+			/*
+			 * Should we collect AM-specific statistics for any of the columns?
+			 *
+			 * If AM-specific statistics are enabled (using a GUC), see if we
+			 * have an optional support procedure to build the statistics.
+			 *
+			 * If there's any such attribute, we just force building stats
+			 * even for regular index keys (not just expressions) and indexes
+			 * without predicates. It'd be good to only build the AM stats, but
+			 * for now this is good enough.
+			 *
+			 * XXX The GUC is there morestly to make it easier to enable/disable
+			 * this during development.
+			 *
+			 * FIXME Only build the AM statistics, not the other stats. And only
+			 * do that for the keys with the optional procedure. not all of them.
+			 */
+			collectAmStats = false;
+			if (enable_indexam_stats && (Irel[ind]->rd_indam->amstatsprocnum != 0))
+			{
+				for (int j = 0; j < indexInfo->ii_NumIndexAttrs; j++)
+				{
+					regproc = index_getprocid(Irel[ind], (j+1), Irel[ind]->rd_indam->amstatsprocnum);
+					if (OidIsValid(regproc))
+					{
+						collectAmStats = true;
+						break;
+					}
+				}
+			}
+
+			if ((indexInfo->ii_Expressions != NIL || collectAmStats) && va_cols == NIL)
 			{
 				ListCell   *indexpr_item = list_head(indexInfo->ii_Expressions);
 
 				thisdata->vacattrstats = (VacAttrStats **)
-					palloc(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
+					palloc0(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
 				tcnt = 0;
 				for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
 				{
@@ -482,6 +519,12 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 						if (thisdata->vacattrstats[tcnt] != NULL)
 							tcnt++;
 					}
+					else
+					{
+						thisdata->vacattrstats[tcnt] =
+							examine_attribute(Irel[ind], i + 1, NULL);
+						tcnt++;
+					}
 				}
 				thisdata->attr_cnt = tcnt;
 			}
@@ -587,7 +630,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 		if (nindexes > 0)
 			compute_index_stats(onerel, totalrows,
-								indexdata, nindexes,
+								indexdata, Irel, nindexes,
 								rows, numrows,
 								col_context);
 
@@ -821,12 +864,93 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	anl_context = NULL;
 }
 
+/*
+ * compute_indexam_stats
+ *		Call the optional procedure to compute AM-specific statistics.
+ *
+ * We simply call the procedure, which is expected to produce a bytea value.
+ *
+ * At the moment this only deals with BRIN indexes, and bails out for other
+ * access methods, but it should be generic - use something like amoptsprocnum
+ * and just check if the procedure exists.
+ */
+static void
+compute_indexam_stats(Relation onerel,
+					  Relation indexRel, IndexInfo *indexInfo,
+					  double totalrows, AnlIndexData *indexdata,
+					  HeapTuple *rows, int numrows)
+{
+	int		expridx;
+
+	if (!enable_indexam_stats)
+		return;
+
+	/* ignore index AMs without the optional procedure */
+	if (indexRel->rd_indam->amstatsprocnum == 0)
+		return;
+
+	/*
+	 * Look at attributes, and calculate stats for those that have the
+	 * optional stats proc for the opfamily.
+	 */
+	expridx = 0;
+	for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+	{
+		AttrNumber		attno = (i + 1);
+		AttrNumber		attnum = indexInfo->ii_IndexAttrNumbers[i];	/* heap attnum */
+		RegProcedure	regproc;
+		FmgrInfo	   *statsproc;
+		Datum			datum;
+		VacAttrStats   *stats;
+		MemoryContext	oldcxt;
+		Node		   *expr = NULL;
+
+		if (!AttributeNumberIsValid(attnum))
+		{
+			expr = (Node *) list_nth(RelationGetIndexExpressions(indexRel),
+									 expridx);
+			expridx++;
+		}
+
+		/* do this first, as it doesn't fail when proc not defined */
+		regproc = index_getprocid(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+
+		/* ignore opclasses without the optional procedure */
+		if (!RegProcedureIsValid(regproc))
+			continue;
+
+		statsproc = index_getprocinfo(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+		Assert(statsproc != NULL);
+
+		stats = indexdata->vacattrstats[i];
+
+		oldcxt = MemoryContextSwitchTo(stats->anl_context);
+
+		/* call the proc, let the AM calculate whatever it wants */
+		/* XXX maybe we should just pass the index attno and leave the
+		 * expression handling up to the procedure? */
+		datum = FunctionCall7Coll(statsproc,
+								  InvalidOid, /* FIXME correct collation */
+								  PointerGetDatum(onerel),
+								  PointerGetDatum(indexRel),
+								  Int16GetDatum(attno),
+								  Int16GetDatum(attnum),
+								  PointerGetDatum(expr),
+								  PointerGetDatum(rows),
+								  Int32GetDatum(numrows));
+
+		stats->staindexam = datum;
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+}
+
 /*
  * Compute statistics about indexes of a relation
  */
 static void
 compute_index_stats(Relation onerel, double totalrows,
-					AnlIndexData *indexdata, int nindexes,
+					AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 					HeapTuple *rows, int numrows,
 					MemoryContext col_context)
 {
@@ -846,6 +970,7 @@ compute_index_stats(Relation onerel, double totalrows,
 	{
 		AnlIndexData *thisdata = &indexdata[ind];
 		IndexInfo  *indexInfo = thisdata->indexInfo;
+		Relation	indexRel = indexRels[ind];
 		int			attr_cnt = thisdata->attr_cnt;
 		TupleTableSlot *slot;
 		EState	   *estate;
@@ -858,6 +983,13 @@ compute_index_stats(Relation onerel, double totalrows,
 					rowno;
 		double		totalindexrows;
 
+		/*
+		 * If this is a BRIN index, try calling a procedure to collect
+		 * extra opfamily-specific statistics (if procedure defined).
+		 */
+		compute_indexam_stats(onerel, indexRel, indexInfo, totalrows,
+							  thisdata, rows, numrows);
+
 		/* Ignore index if no columns to analyze and not partial */
 		if (attr_cnt == 0 && indexInfo->ii_Predicate == NIL)
 			continue;
@@ -1661,6 +1793,13 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+
+		/* optional AM-specific stats */
+		if (DatumGetPointer(stats->staindexam) != NULL)
+			values[Anum_pg_statistic_staindexam - 1] = stats->staindexam;
+		else
+			nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 572d9b44643..97fee77ea57 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -2370,6 +2370,8 @@ serialize_expr_stats(AnlExprData *exprdata, int nexprs)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+		nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index fe37e65af03..cc2f3ef012a 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7834,6 +7834,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	Relation	indexRel;
 	ListCell   *l;
 	VariableStatData vardata;
+	double		averageOverlaps;
 
 	Assert(rte->rtekind == RTE_RELATION);
 
@@ -7881,6 +7882,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 * correlation statistics, we will keep it as 0.
 	 */
 	*indexCorrelation = 0;
+	averageOverlaps = 0.0;
 
 	foreach(l, path->indexclauses)
 	{
@@ -7890,6 +7892,36 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		/* attempt to lookup stats in relation for this index column */
 		if (attnum != 0)
 		{
+			/*
+			 * If AM-specific statistics are enabled, try looking up the stats
+			 * for the index key. We only have this for minmax opclasses, so
+			 * we just cast it like that. But other BRIN opclasses might need
+			 * other stats so either we need to abstract this somehow, or maybe
+			 * just collect a sufficiently generic stats for all BRIN indexes.
+			 *
+			 * XXX Make this non-minmax specific.
+			 */
+			if (enable_indexam_stats)
+			{
+				BrinMinmaxStats  *amstats
+					= (BrinMinmaxStats *) get_attindexam(index->indexoid, attnum);
+
+				if (amstats)
+				{
+					elog(DEBUG1, "found AM stats: attnum %d n_ranges %lld n_summarized %lld n_all_nulls %lld n_has_nulls %lld avg_overlaps %f",
+						 attnum, (long long)amstats->n_ranges, (long long)amstats->n_summarized,
+						 (long long)amstats->n_all_nulls, (long long)amstats->n_has_nulls,
+						 amstats->avg_overlaps);
+
+					/*
+					 * The only thing we use at the moment is the average number
+					 * of overlaps for a single range. Use the other stuff too.
+					 */
+					averageOverlaps = Max(averageOverlaps,
+										  1.0 + amstats->avg_overlaps);
+				}
+			}
+
 			/* Simple variable -- look to stats for the underlying table */
 			if (get_relation_stats_hook &&
 				(*get_relation_stats_hook) (root, rte, attnum, &vardata))
@@ -7970,6 +8002,14 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 											 baserel->relid,
 											 JOIN_INNER, NULL);
 
+	/*
+	 * XXX Can we combine qualSelectivity with the average number of matching
+	 * ranges per value? qualSelectivity estimates how many tuples ar we
+	 * going to match, and average number of matches says how many ranges
+	 * will each of those match on average. We don't know how many will
+	 * be duplicate, but it gives us a worst-case estimate, at least.
+	 */
+
 	/*
 	 * Now calculate the minimum possible ranges we could match with if all of
 	 * the rows were in the perfect order in the table's heap.
@@ -7986,6 +8026,25 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	else
 		estimatedRanges = Min(minimalRanges / *indexCorrelation, indexRanges);
 
+	elog(DEBUG1, "before index AM stats: cestimatedRanges = %f", estimatedRanges);
+
+	/*
+	 * If we found some AM stats, look at average number of overlapping ranges,
+	 * and apply that to the currently estimated ranges.
+	 *
+	 * XXX We pretty much combine this with correlation info (because it was
+	 * already applied in the estimatedRanges formula above), which might be
+	 * overly pessimistic. The overlaps stats seems somewhat redundant with
+	 * the correlation, so maybe we should do just one? The AM stats seems
+	 * like a more reliable information, because the correlation is not very
+	 * sensitive to outliers, for example. So maybe let's prefer that, and
+	 * only use the correlation as fallback when AM stats are not available?
+	 */
+	if (averageOverlaps > 0.0)
+		estimatedRanges = Min(estimatedRanges * averageOverlaps, indexRanges);
+
+	elog(DEBUG1, "after index AM stats: cestimatedRanges = %f", estimatedRanges);
+
 	/* we expect to visit this portion of the table */
 	selec = estimatedRanges / indexRanges;
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index c07382051d6..e41aabdeae0 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -3138,6 +3138,47 @@ get_attavgwidth(Oid relid, AttrNumber attnum)
 	return 0;
 }
 
+
+/*
+ * get_attstaindexam
+ *
+ *	  Given the table and attribute number of a column, get the index AM
+ *	  statistics.  Return NULL if no data available.
+ *
+ * Currently this is only consulted for individual tables, not for inheritance
+ * trees, so we don't need an "inh" parameter.
+ */
+bytea *
+get_attindexam(Oid relid, AttrNumber attnum)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache3(STATRELATTINH,
+						 ObjectIdGetDatum(relid),
+						 Int16GetDatum(attnum),
+						 BoolGetDatum(false));
+	if (HeapTupleIsValid(tp))
+	{
+		Datum	val;
+		bytea  *retval = NULL;
+		bool	isnull;
+
+		val = SysCacheGetAttr(STATRELATTINH, tp,
+							  Anum_pg_statistic_staindexam,
+							  &isnull);
+
+		if (!isnull)
+			retval = (bytea *) PG_DETOAST_DATUM(val);
+
+		// staindexam = ((Form_pg_statistic) GETSTRUCT(tp))->staindexam;
+		ReleaseSysCache(tp);
+
+		return retval;
+	}
+
+	return NULL;
+}
+
 /*
  * get_attstatsslot
  *
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1c0583fe267..cd3d97a3a63 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1003,6 +1003,16 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexam_stats", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index AM stats."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_indexam_stats,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d06074b86f6..47e80ad150c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -375,6 +375,7 @@
 #enable_hashagg = on
 #enable_hashjoin = on
 #enable_incremental_sort = on
+#enable_indexam_stats = off
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 4f1f67b4d03..e3eab725ae5 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -216,6 +216,8 @@ typedef struct IndexAmRoutine
 	uint16		amsupport;
 	/* opclass options support function number or 0 */
 	uint16		amoptsprocnum;
+	/* opclass statistics support function number or 0 */
+	uint16		amstatsprocnum;
 	/* does AM support ORDER BY indexed column's value? */
 	bool		amcanorder;
 	/* does AM support ORDER BY result of an operator on indexed column? */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index ed66f1b3d51..1d21b816fcd 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,6 +34,69 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinMinmaxStats
+{
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	int64		n_ranges;
+	int64		n_summarized;
+	int64		n_all_nulls;
+	int64		n_has_nulls;
+
+	/* average number of overlapping ranges */
+	double		avg_overlaps;
+
+	/* average number of matching ranges (per value) */
+	double		avg_matches;
+	double		avg_matches_unique;
+
+	/* minval/maxval stats (ndistinct, correlation to blkno) */
+	int64		minval_ndistinct;
+	int64		maxval_ndistinct;
+	double		minval_correlation;
+	double		maxval_correlation;
+
+	/* minval/maxval increment stats */
+	double		minval_increment_avg;
+	double		minval_increment_max;
+	double		maxval_increment_avg;
+	double		maxval_increment_max;
+
+} BrinMinmaxStats;
 
 #define BRIN_DEFAULT_PAGES_PER_RANGE	128
 #define BrinGetPagesPerRange(relation) \
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index 97ddc925b27..eac796e6f47 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -75,6 +75,7 @@ typedef struct BrinDesc
 #define BRIN_PROCNUM_OPTIONS 		5	/* optional */
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
+#define BRIN_PROCNUM_STATISTICS		11	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 5b950129de0..9bbd1f14f12 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -804,6 +804,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -835,6 +837,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -864,6 +868,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -893,6 +899,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -905,6 +913,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -917,6 +927,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1034,6 +1046,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1062,6 +1076,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1110,6 +1126,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1160,6 +1178,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1173,6 +1194,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1261,6 +1285,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1314,6 +1341,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1366,6 +1396,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1436,6 +1468,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1467,6 +1502,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1517,6 +1554,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1530,6 +1570,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1542,6 +1585,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1668,6 +1713,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1721,6 +1769,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1771,6 +1822,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1785,6 +1838,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1799,6 +1855,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1851,6 +1910,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1924,6 +1985,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e2a7642a2ba..af31ff911eb 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8496,6 +8496,10 @@
 { oid => '3386', descr => 'BRIN minmax support',
   proname => 'brin_minmax_union', prorettype => 'bool',
   proargtypes => 'internal internal internal', prosrc => 'brin_minmax_union' },
+{ oid => '9800', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_stats', prorettype => 'bool',
+  proargtypes => 'internal internal int2 int2 internal int4',
+  prosrc => 'brin_minmax_stats' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 8770c5b4c60..d3d0bce257a 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -121,6 +121,11 @@ CATALOG(pg_statistic,2619,StatisticRelationId)
 	anyarray	stavalues3;
 	anyarray	stavalues4;
 	anyarray	stavalues5;
+
+	/*
+	 * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).
+	 */
+	bytea		staindexam;
 #endif
 } FormData_pg_statistic;
 
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 689dbb77024..dba411cacf7 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -155,6 +155,7 @@ typedef struct VacAttrStats
 	float4	   *stanumbers[STATISTIC_NUM_SLOTS];
 	int			numvalues[STATISTIC_NUM_SLOTS];
 	Datum	   *stavalues[STATISTIC_NUM_SLOTS];
+	Datum		staindexam;		/* index-specific stats (as bytea) */
 
 	/*
 	 * These fields describe the stavalues[n] element types. They will be
@@ -299,6 +300,7 @@ extern PGDLLIMPORT int vacuum_multixact_freeze_min_age;
 extern PGDLLIMPORT int vacuum_multixact_freeze_table_age;
 extern PGDLLIMPORT int vacuum_failsafe_age;
 extern PGDLLIMPORT int vacuum_multixact_failsafe_age;
+extern PGDLLIMPORT bool enable_indexam_stats;
 
 /* Variables for cost-based parallel vacuum */
 extern PGDLLIMPORT pg_atomic_uint32 *VacuumSharedCostBalance;
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b9728..fcef91d306d 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -185,6 +185,7 @@ extern Oid	getBaseType(Oid typid);
 extern Oid	getBaseTypeAndTypmod(Oid typid, int32 *typmod);
 extern int32 get_typavgwidth(Oid typid, int32 typmod);
 extern int32 get_attavgwidth(Oid relid, AttrNumber attnum);
+extern bytea *get_attindexam(Oid relid, AttrNumber attnum);
 extern bool get_attstatsslot(AttStatsSlot *sslot, HeapTuple statstuple,
 							 int reqkind, Oid reqop, int flags);
 extern void free_attstatsslot(AttStatsSlot *sslot);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 001c6e7eb9d..b7fda6fc828 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -117,6 +117,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashagg                 | on
  enable_hashjoin                | on
  enable_incremental_sort        | on
+ enable_indexam_stats           | off
  enable_indexonlyscan           | on
  enable_indexscan               | on
  enable_material                | on
@@ -132,7 +133,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(21 rows)
+(22 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.39.2

0002-wip-introduce-debug_brin_stats-20230225.patchtext/x-patch; charset=UTF-8; name=0002-wip-introduce-debug_brin_stats-20230225.patchDownload

From a7cffa2b7bbee27f272eb078ee97cea48b2051cb Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 17:46:51 +0200
Subject: [PATCH 02/11] wip: introduce debug_brin_stats

---
 src/backend/access/brin/brin_minmax.c | 80 +++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c   | 18 ++++++
 2 files changed, 98 insertions(+)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 14fff22247a..2a4ae5a028e 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -28,6 +28,10 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 
+#ifdef DEBUG_BRIN_STATS
+bool debug_brin_stats = false;
+#endif
+
 typedef struct MinmaxOpaque
 {
 	Oid			cached_subtype;
@@ -493,6 +497,13 @@ brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
 {
 	int64	noverlaps;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	noverlaps = 0;
 	for (int i = 0; i < nranges; i++)
 	{
@@ -515,6 +526,16 @@ brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
 	 */
 	noverlaps *= 2;
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->avg_overlaps = (double) noverlaps / nranges;
 }
 
@@ -540,6 +561,13 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 
 	int64  *unique = (int64 *) palloc0(sizeof(int64) * nvalues);
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	/*
 	 * Build running count of unique values. We know there are unique[i]
 	 * unique values in values array up to index "i".
@@ -588,6 +616,18 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 	Assert(nmatches >= 0);
 	Assert(nmatches_unique >= 0);
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f", nmatches_unique, nvalues_unique,
+			(double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->avg_matches = (double) nmatches / numrows;
 	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
 }
@@ -619,6 +659,13 @@ brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
 			minval_corr = 0,
 			maxval_corr = 0;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	for (int i = 1; i < nranges; i++)
 	{
 		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
@@ -641,6 +688,19 @@ brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
 
 	stats->minval_correlation = (double) minval_corr / nranges;
 	stats->maxval_correlation = (double) maxval_corr / nranges;
+
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_value_stats -----");
+		elog(WARNING, "minval ndistinct " INT64_FORMAT " correlation %f",
+			 stats->minval_ndistinct, stats->minval_correlation);
+		elog(WARNING, "maxval ndistinct " INT64_FORMAT " correlation %f",
+			 stats->maxval_ndistinct, stats->maxval_correlation);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
 }
 
 /*
@@ -665,6 +725,13 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 			max_minval = 0,
 			max_maxval = 0;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	for (int i = 1; i < nranges; i++)
 	{
 		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
@@ -716,6 +783,19 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 		}
 	}
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_increment_stats -----");
+		elog(WARNING, "minval ndistinct %ld sum %f max %f avg %f",
+			 minval_ndist, sum_minval, max_minval, sum_minval / minval_ndist);
+		elog(WARNING, "maxval ndistinct %ld sum %f max %f avg %f",
+			 maxval_ndist, sum_maxval, max_maxval, sum_maxval / maxval_ndist);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->minval_increment_avg = (sum_minval / minval_ndist);
 	stats->minval_increment_max = max_minval;
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index cd3d97a3a63..6abefe24be3 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -97,6 +97,10 @@ extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
 extern bool synchronize_seqscans;
 
+#ifdef DEBUG_BRIN_STATS
+extern bool debug_brin_stats;
+#endif
+
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
 #endif
@@ -1231,6 +1235,20 @@ struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+#ifdef DEBUG_BRIN_STATS
+	/* this is undocumented because not exposed in a standard build */
+	{
+		{"debug_brin_stats", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Print info about calculated BRIN statistics."),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&debug_brin_stats,
+		false,
+		NULL, NULL, NULL
+	},
+#endif
+
 	{
 		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Terminate session on any error."),
-- 
2.39.2

0003-wip-introduce-debug_brin_cross_check-20230225.patchtext/x-patch; charset=UTF-8; name=0003-wip-introduce-debug_brin_cross_check-20230225.patchDownload

From c1d3b6fddf5d19c4cdc4b6bb8ad3a94af05c5c54 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 20:47:31 +0200
Subject: [PATCH 03/11] wip: introduce debug_brin_cross_check

---
 src/backend/access/brin/brin_minmax.c | 574 ++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c   |  10 +
 2 files changed, 584 insertions(+)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 2a4ae5a028e..2f67334543d 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -30,6 +30,7 @@
 
 #ifdef DEBUG_BRIN_STATS
 bool debug_brin_stats = false;
+bool debug_brin_cross_check = false;
 #endif
 
 typedef struct MinmaxOpaque
@@ -340,6 +341,50 @@ range_values_cmp(const void *a, const void *b, void *arg)
 	return DatumGetInt32(c);
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * maxval_start
+ *		Determine first index so that (maxvalue >= value).
+ *
+ * The array of ranges is expected to be sorted by maxvalue, so this is the first
+ * range that can possibly intersect with range having "value" as minval.
+ */
+static int
+maxval_start(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[start]->max_value, typcache) <= 0)
+		return 0;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[end]->max_value, typcache) > 0)
+		return nranges;
+
+	while ((end - start) > 0)
+	{
+		int	 midpoint;
+		int	 r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->max_value, typcache);
+
+		if (r <= 0)
+			end = midpoint;
+		else
+			start = (midpoint + 1);
+	}
+
+	Assert(ranges[start]->max_value >= value);
+	Assert(ranges[start-1]->max_value < value);
+
+	return start;
+}
+#endif
+
 /*
  * minval_end
  *		Determine first index so that (minval > value).
@@ -632,6 +677,316 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * Simple histogram, with bins tracking value and two overlap counts.
+ *
+ * XXX Maybe we should have two separate histograms, one for all counts and
+ * another one for "unique" values.
+ *
+ * XXX Serialize the histogram. There might be a data set where we have very
+ * many distinct buckets (values having very different number of matching
+ * ranges) - not sure if there's some sort of upper limit (but hard to say for
+ * other opclasses, like bloom). And we don't want arbitrarily large histogram,
+ * to keep the statistics fairly small, I guess. So we'd need to pick a subset,
+ * merge buckets with "similar" counts, or approximate it somehow. For now we
+ * don't serialize it, because we don't use the histogram.
+ */
+typedef struct histogram_bin_t
+{
+	int64	value;
+	int64	count;
+} histogram_bin_t;
+
+typedef struct histogram_t
+{
+	int				nbins;
+	int				nbins_max;
+	histogram_bin_t	bins[FLEXIBLE_ARRAY_MEMBER];
+} histogram_t;
+
+#define HISTOGRAM_BINS_START 32
+
+/* allocate histogram with default number of bins */
+static histogram_t *
+histogram_init(void)
+{
+	histogram_t *hist;
+
+	hist = (histogram_t *) palloc0(offsetof(histogram_t, bins) +
+								   sizeof(histogram_bin_t) * HISTOGRAM_BINS_START);
+	hist->nbins_max = HISTOGRAM_BINS_START;
+
+	return hist;
+}
+
+/*
+ * histogram_add
+ *			 Add a hit for a particular value to the histogram.
+ *
+ * XXX We don't sort the bins, so just do binary sort. For large number of values
+ * this might be an issue, for small number of values a linear search is fine.
+ */
+static histogram_t *
+histogram_add(histogram_t *hist, int value)
+{
+	bool	found = false;
+	histogram_bin_t *bin;
+
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		if (hist->bins[i].value == value)
+		{
+			bin = &hist->bins[i];
+			found = true;
+		}
+	}
+
+	if (!found)
+	{
+		if (hist->nbins == hist->nbins_max)
+		{
+			int		nbins = (2 * hist->nbins_max);
+
+			hist = repalloc(hist, offsetof(histogram_t, bins) +
+							sizeof(histogram_bin_t) * nbins);
+			hist->nbins_max = nbins;
+		}
+
+		Assert(hist->nbins < hist->nbins_max);
+
+		bin = &hist->bins[hist->nbins++];
+		bin->value = value;
+		bin->count = 0;
+	}
+
+	bin->count += 1;
+
+	Assert(bin->value == value);
+	Assert(bin->count >= 0);
+
+	return hist;
+}
+
+/* used to sort histogram bins by value */
+static int
+histogram_bin_cmp(const void *a, const void *b)
+{
+	histogram_bin_t *ba = (histogram_bin_t *) a;
+	histogram_bin_t *bb = (histogram_bin_t *) b;
+
+	if (ba->value < bb->value)
+		return -1;
+
+	if (bb->value < ba->value)
+		return 1;
+
+	return 0;
+}
+
+static void
+histogram_print(histogram_t *hist)
+{
+	return;
+
+	elog(WARNING, "----- histogram -----");
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		elog(WARNING, "bin %d value %ld count %ld",
+				i, hist->bins[i].value, hist->bins[i].count);
+	}
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges2
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Match sample tuples to the ranges, so that we can count how many ranges
+ * a value matches on average. This might seem redundant to the number of
+ * overlaps, because the value is ~avg_overlaps/2.
+ *
+ * Imagine ranges arranged in "shifted" uniformly by 1/overlaps, e.g. with 3
+ * overlaps [0,100], [33,133], [66, 166] and so on. A random value will hit
+ * only half of there ranges, thus 1/2. This can be extended to randomly
+ * overlapping ranges.
+ *
+ * However, we may not be able to count overlaps for some opclasses (e.g. for
+ * bloom ranges), in which case we have at least this.
+ *
+ * This simply walks the values, and determines matching ranges by looking
+ * for lower/upper bound in ranges ordered by minval/maxval.
+ *
+ * XXX The other question is what to do about duplicate values. If we have a
+ * very frequent value in the sample, it's likely in many places/ranges. Which
+ * will skew the average, because it'll be added repeatedly. So we also count
+ * avg_ranges for unique values.
+ *
+ * XXX The relationship that (average_matches ~ average_overlaps/2) only
+ * works for minmax opclass, and can't be extended to minmax-multi. The
+ * overlaps can only consider the two extreme values (essentially treating
+ * the summary as a single minmax range), because that's what brinsort
+ * needs. But the minmax-multi range may have "gaps" (kinda the whole point
+ * of these opclasses), which affects matching tuples to ranges.
+ *
+ * XXX This also builds histograms of the number of matches, both for the
+ * raw and unique values. At the moment we don't do anything with the
+ * results, though (except for printing those).
+ */
+static void
+brin_minmax_match_tuples_to_ranges2(BrinRanges *ranges,
+									BrinRange **minranges, BrinRange **maxranges,
+									int numrows, HeapTuple *rows,
+									int nvalues, Datum *values,
+									TypeCacheEntry *typcache,
+									BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nmatches_value = 0;
+	int64	nvalues_unique = 0;
+
+	histogram_t	   *hist = histogram_init();
+	histogram_t	   *hist_unique = histogram_init();
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		int		start;
+		int		end;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Same value as preceding, so just use the preceding count.
+		 * We don't increment the unique counters, because this is
+		 * a duplicate.
+		 */
+		if ((i > 0) && (range_values_cmp(&values[i-1], &values[i], typcache) == 0))
+		{
+			nmatches += nmatches_value;
+			hist = histogram_add(hist, nmatches_value);
+			continue;
+		}
+
+		nmatches_value = 0;
+
+		start = maxval_start(maxranges, ranges->nranges, values[i], typcache);
+		end = minval_end(minranges, ranges->nranges, values[i], typcache);
+
+		for (int j = start; j < ranges->nranges; j++)
+		{
+			if (maxranges[j]->min_index >= end)
+				continue;
+
+			if (maxranges[j]->min_index_lowest >= end)
+				break;
+
+			nmatches_value++;
+		}
+
+		hist = histogram_add(hist, nmatches_value);
+		hist_unique = histogram_add(hist_unique, nmatches_value);
+
+		nmatches += nmatches_value;
+		nmatches_unique += nmatches_value;
+		nvalues_unique++;
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges2 -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f",
+			 nmatches_unique, nvalues_unique, (double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_matches != (double) nmatches / numrows)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges2: avg_matches mismatch %f != %f",
+			 stats->avg_matches, (double) nmatches / numrows);
+
+	if (stats->avg_matches_unique != (double) nmatches_unique / nvalues_unique)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges2: avg_matches_unique mismatch %f != %f",
+			 stats->avg_matches_unique, (double) nmatches_unique / nvalues_unique);
+
+	pg_qsort(hist->bins, hist->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+	pg_qsort(hist_unique->bins, hist_unique->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+
+	histogram_print(hist);
+	histogram_print(hist_unique);
+
+	pfree(hist);
+	pfree(hist_unique);
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges_bruteforce
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Bruteforce approach, used mostly for cross-checking.
+ */
+static void
+brin_minmax_match_tuples_to_ranges_bruteforce(BrinRanges *ranges,
+											  int numrows, HeapTuple *rows,
+											  int nvalues, Datum *values,
+											  TypeCacheEntry *typcache,
+											  BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nvalues_unique = 0;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		bool	is_unique;
+		int64	nmatches_value = 0;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* is this a new value? */
+		is_unique = ((i == 0) || (range_values_cmp(&values[i-1], &values[i], typcache) != 0));
+
+		/* count unique values */
+		nvalues_unique += (is_unique) ? 1 : 0;
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			if (range_values_cmp(&values[i], &ranges->ranges[j].min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&values[i], &ranges->ranges[j].max_value, typcache) > 0)
+				continue;
+
+			nmatches_value++;
+		}
+
+		nmatches += nmatches_value;
+		nmatches_unique += (is_unique) ? nmatches_value : 0;
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges_bruteforce -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f", nmatches_unique, nvalues_unique,
+			 (double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_matches != (double) nmatches / numrows)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges_bruteforce: avg_matches mismatch %f != %f",
+			 stats->avg_matches, (double) nmatches / numrows);
+
+	if (stats->avg_matches_unique != (double) nmatches_unique / nvalues_unique)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges_bruteforce: avg_matches_unique mismatch %f != %f",
+			 stats->avg_matches_unique, (double) nmatches_unique / nvalues_unique);
+}
+#endif
+
 /*
  * brin_minmax_value_stats
  *		Calculate statistics about minval/maxval values.
@@ -803,6 +1158,198 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 	stats->maxval_increment_max = max_maxval;
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * brin_minmax_count_overlaps2
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges/maxranges to quickly eliminate ranges that can't
+ * possibly intersect.
+ *
+ * XXX Seems rather complicated and works poorly for wide ranges (with outlier
+ * values), brin_minmax_count_overlaps is likely better.
+ */
+static void
+brin_minmax_count_overlaps2(BrinRanges *ranges,
+						   BrinRange **minranges, BrinRange **maxranges,
+						   TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	int64			noverlaps;
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Walk the ranges ordered by max_values, see how many ranges overlap.
+	 *
+	 * Once we get to a state where (min_value > current.max_value) for
+	 * all future ranges, we know none of them can overlap and we can
+	 * terminate. This is what min_index_lowest is for.
+	 *
+	 * XXX If there are very wide ranges (with outlier min/max values),
+	 * the min_index_lowest is going to be pretty useless, because the
+	 * range will be sorted at the very end by max_value, but will have
+	 * very low min_index, so this won't work.
+	 *
+	 * XXX We could collect a more elaborate stuff, like for example a
+	 * histogram of number of overlaps, or maximum number of overlaps.
+	 * So we'd have average, but then also an info if there are some
+	 * ranges with very many overlaps.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int			idx = (i + 1);
+		BrinRange *ra = maxranges[i];
+		uint64		min_index = ra->min_index;
+
+		CHECK_FOR_INTERRUPTS();
+
+#ifdef NOT_USED
+		/*
+		 * XXX Not needed, we can just count "future" ranges and then
+		 * we just multiply by 2.
+		 */
+
+		/*
+		 * What's the first range that might overlap with this one?
+		 * needs to have maxval > current.minval.
+		 */
+		while (idx > 0)
+		{
+			BrinRange *rb = maxranges[idx - 1];
+
+			/* the range is before the current one, so can't intersect */
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				break;
+
+			idx--;
+		}
+#endif
+
+		/*
+		 * Find the first min_index that is higher than the max_value,
+		 * so that we can compare that instead of the values in the
+		 * next loop. There should be fewer value comparisons than in
+		 * the next loop, so we'll save on function calls.
+		 */
+		while (min_index < ranges->nranges)
+		{
+			if (range_values_cmp(&minranges[min_index]->min_value,
+								 &ra->max_value, typcache) > 0)
+				break;
+
+			min_index++;
+		}
+
+		/*
+		 * Walk the following ranges (ordered by max_value), and check
+		 * if it overlaps. If it matches, we look at the next one. If
+		 * not, we check if there can be more ranges.
+		 */
+		for (int j = idx; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = maxranges[j];
+
+			/* the range overlaps - just continue with the next one */
+			// if (range_values_cmp(&rb->min_value, &ra->max_value, typcache) <= 0)
+			if (rb->min_index < min_index)
+			{
+				noverlaps++;
+				continue;
+			}
+
+			/*
+			 * Are there any future ranges that might overlap? We can
+			 * check the min_index_lowest to decide quickly.
+			 */
+			 if (rb->min_index_lowest >= min_index)
+					break;
+		}
+	}
+
+	/*
+	 * We only count intersect for "following" ranges when ordered by maxval,
+	 * so we only see 1/2 the overlaps. So double the result.
+	 */
+	noverlaps *= 2;
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps2 -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_overlaps != (double) noverlaps / ranges->nranges)
+		elog(ERROR, "brin_minmax_count_overlaps2: mismatch %f != %f",
+			 stats->avg_overlaps, (double) noverlaps / ranges->nranges);
+}
+
+/*
+ * brin_minmax_count_overlaps_bruteforce
+ *		Calculate number of overlaps by brute force.
+ *
+ * Actually compares every range to every other range. Quite expensive, used
+ * primarily to cross-check the other algorithms.
+ */
+static void
+brin_minmax_count_overlaps_bruteforce(BrinRanges *ranges,
+									  TypeCacheEntry *typcache,
+									  BrinMinmaxStats *stats)
+{
+	int64			noverlaps;
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Brute force calculation of overlapping ranges, comparing each
+	 * range to every other range - bound to be pretty expensive, as
+	 * it's pretty much O(N^2). Kept mostly for easy cross-check with
+	 * the preceding "optimized" code.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		BrinRange *ra = &ranges->ranges[i];
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = &ranges->ranges[j];
+
+			CHECK_FOR_INTERRUPTS();
+
+			if (i == j)
+				continue;
+
+			if (range_values_cmp(&ra->max_value, &rb->min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				continue;
+
+#if 0
+			elog(DEBUG1, "[%ld,%ld] overlaps [%ld,%ld]",
+				 ra->min_value, ra->max_value,
+				 rb->min_value, rb->max_value);
+#endif
+
+			noverlaps++;
+		}
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps_bruteforce -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_overlaps != (double) noverlaps / ranges->nranges)
+		elog(ERROR, "brin_minmax_count_overlaps2: mismatch %f != %f",
+			 stats->avg_overlaps, (double) noverlaps / ranges->nranges);
+}
+#endif
+
 /*
  * brin_minmax_stats
  *		Calculate custom statistics for a BRIN minmax index.
@@ -814,6 +1361,11 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
  *  - average number of rows matching a range
  *  - number of distinct minval/maxval values
  *
+ * There are multiple ways to calculate some of the metrics, so to allow
+ * cross-checking during development it's possible to run and compare all.
+ * To do that, define STATS_CROSS_CHECK. There's also STATS_DEBUG define
+ * that simply prints the calculated results.
+ *
  * XXX This could also calculate correlation of the range minval, so that
  * we can estimate how much random I/O will happen during the BrinSort.
  * And perhaps we should also sort the ranges by (minval,block_start) to
@@ -1141,6 +1693,14 @@ brin_minmax_stats(PG_FUNCTION_ARGS)
 	/* calculate average number of overlapping ranges for any range */
 	brin_minmax_count_overlaps(minranges, ranges->nranges, typcache, stats);
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_cross_check)
+	{
+		brin_minmax_count_overlaps2(ranges, minranges, maxranges, typcache, stats);
+		brin_minmax_count_overlaps_bruteforce(ranges, typcache, stats);
+	}
+#endif
+
 	/* calculate minval/maxval stats (distinct values and correlation) */
 	brin_minmax_value_stats(minranges, maxranges,
 							ranges->nranges, typcache, stats);
@@ -1206,6 +1766,20 @@ brin_minmax_stats(PG_FUNCTION_ARGS)
 										   numrows, rows, nvalues, values,
 										   typcache, stats);
 
+#ifdef DEBUG_BRIN_STATS
+		if (debug_brin_cross_check)
+		{
+			brin_minmax_match_tuples_to_ranges2(ranges, minranges, maxranges,
+												numrows, rows, nvalues, values,
+												typcache, stats);
+
+			brin_minmax_match_tuples_to_ranges_bruteforce(ranges,
+														  numrows, rows,
+														  nvalues, values,
+														  typcache, stats);
+		}
+#endif
+
 		brin_minmax_increment_stats(minranges, maxranges, ranges->nranges,
 									values, nvalues, typcache, stats);
 	}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 6abefe24be3..9748a3bfcc5 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -99,6 +99,7 @@ extern bool synchronize_seqscans;
 
 #ifdef DEBUG_BRIN_STATS
 extern bool debug_brin_stats;
+extern bool debug_brin_cross_check;
 #endif
 
 #ifdef TRACE_SYNCSCAN
@@ -1247,6 +1248,15 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"debug_brin_cross_check", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Cross-check calculation of BRIN statistics."),
+			NULL
+		},
+		&debug_brin_cross_check,
+		false,
+		NULL, NULL, NULL
+	},
 #endif
 
 	{
-- 
2.39.2

0004-Allow-BRIN-indexes-to-produce-sorted-output-20230225.patchtext/x-patch; charset=UTF-8; name=0004-Allow-BRIN-indexes-to-produce-sorted-output-20230225.patchDownload

From 1cb963c7e2a88203b656b931a7b2a5d7f47dd41f Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 9 Oct 2022 11:33:37 +0200
Subject: [PATCH 04/11] Allow BRIN indexes to produce sorted output

Some BRIN indexes can be used to produce sorted output, by using the
range information to sort tuples incrementally. This is particularly
interesting for LIMIT queries, which only need to scan the first few
rows, and alternative plans (e.g. Seq Scan + Sort) have a very high
startup cost.

Of course, if there are e.g. BTREE indexes this is going to be slower,
but people are unlikely to have both index types on the same column.

This is disabled by default, use enable_brinsort GUC to enable it.
---
 src/backend/access/brin/brin_minmax.c         |  402 ++++
 src/backend/commands/explain.c                |   44 +
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/meson.build              |    1 +
 src/backend/executor/nodeBrinSort.c           | 1612 +++++++++++++++++
 src/backend/optimizer/path/costsize.c         |  254 +++
 src/backend/optimizer/path/indxpath.c         |  183 ++
 src/backend/optimizer/path/pathkeys.c         |   49 +
 src/backend/optimizer/plan/createplan.c       |  260 +++
 src/backend/optimizer/plan/setrefs.c          |   19 +
 src/backend/optimizer/util/pathnode.c         |   57 +
 src/backend/utils/misc/guc_tables.c           |   28 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/backend/utils/sort/tuplesort.c            |   12 +
 src/include/access/brin.h                     |   35 -
 src/include/access/brin_internal.h            |    1 +
 src/include/catalog/pg_amproc.dat             |   64 +
 src/include/catalog/pg_proc.dat               |    5 +-
 src/include/executor/nodeBrinSort.h           |   47 +
 src/include/nodes/execnodes.h                 |  108 ++
 src/include/nodes/pathnodes.h                 |   11 +
 src/include/nodes/plannodes.h                 |   32 +
 src/include/optimizer/cost.h                  |    3 +
 src/include/optimizer/pathnode.h              |    9 +
 src/include/optimizer/paths.h                 |    3 +
 src/include/utils/tuplesort.h                 |    1 +
 src/test/regress/expected/sysviews.out        |    3 +-
 28 files changed, 3218 insertions(+), 37 deletions(-)
 create mode 100644 src/backend/executor/nodeBrinSort.c
 create mode 100644 src/include/executor/nodeBrinSort.h

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 2f67334543d..6b08d8b288b 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -16,6 +16,10 @@
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
 #include "executor/executor.h"
@@ -43,6 +47,9 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
 
+/* print info about ranges */
+#define BRINSORT_DEBUG
+
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1809,6 +1816,401 @@ cleanup:
 	PG_RETURN_POINTER(stats);
 }
 
+/*
+ * brin_minmax_range_tupdesc
+ *		Create a tuple descriptor to store BrinRange data.
+ */
+static TupleDesc
+brin_minmax_range_tupdesc(BrinDesc *brdesc, AttrNumber attnum)
+{
+	TupleDesc	tupdesc;
+	AttrNumber	attno = 1;
+
+	/* expect minimum and maximum */
+	Assert(brdesc->bd_info[attnum - 1]->oi_nstored == 2);
+
+	tupdesc = CreateTemplateTupleDesc(7);
+
+	/* blkno_start */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT4OID, -1, 0);
+
+	/* blkno_end (could be calculated as blkno_start + pages_per_range) */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT4OID, -1, 0);
+
+	/* has_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* all_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* not_summarized */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* min_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	/* max_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	return tupdesc;
+}
+
+/*
+ * brin_minmax_scan_init
+ *		Prepare the BrinRangeScanDesc including the sorting info etc.
+ *
+ * We want to have the ranges in roughly this order
+ *
+ * - not-summarized
+ * - summarized, non-null values
+ * - summarized, all-nulls
+ *
+ * We do it this way, because the not-summarized ranges need to be
+ * scanned always (both to produce NULL and non-NULL values), and
+ * we need to read all of them into the tuplesort before producing
+ * anything. So placing them at the beginning is reasonable.
+ *
+ * The all-nulls ranges are placed last, because when processing
+ * NULLs we need to scan everything anyway (some of the ranges might
+ * have has_nulls=true). But for non-NULL values we can abort once
+ * we hit the first all-nulls range.
+ *
+ * The regular ranges are sorted by blkno_start, to make it maybe
+ * a bit more sequential (but this only helps if there are ranges
+ * with the same minval).
+ */
+static BrinRangeScanDesc *
+brin_minmax_scan_init(BrinDesc *bdesc, Oid collation, AttrNumber attnum, bool asc)
+{
+	BrinRangeScanDesc  *scan;
+
+	/* sort by (not_summarized, minval, blkno_start, all_nulls) */
+	AttrNumber			keys[4];
+	Oid					collations[4];
+	bool				nullsFirst[4];
+	Oid					operators[4];
+	Oid					typid;
+	TypeCacheEntry	   *typcache;
+
+	/* we expect to have min/max value for each range, same type for both */
+	Assert(bdesc->bd_info[attnum - 1]->oi_nstored == 2);
+	Assert(bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id ==
+		   bdesc->bd_info[attnum - 1]->oi_typcache[1]->type_id);
+
+	scan = (BrinRangeScanDesc *) palloc0(sizeof(BrinRangeScanDesc));
+
+	/* build tuple descriptor for range data */
+	scan->tdesc = brin_minmax_range_tupdesc(bdesc, attnum);
+
+	/* initialize ordering info */
+	keys[0] = 5;				/* not_summarized */
+	keys[1] = 4;				/* all_nulls */
+	keys[2] = (asc) ? 6 : 7;	/* min_value (asc) or max_value (desc) */
+	keys[3] = 1;				/* blkno_start */
+
+	collations[0] = InvalidOid;	/* FIXME */
+	collations[1] = InvalidOid;	/* FIXME */
+	collations[2] = collation;	/* FIXME */
+	collations[3] = InvalidOid;	/* FIXME */
+
+	/* unrelated to the ordering desired by the user */
+	nullsFirst[0] = false;
+	nullsFirst[1] = false;
+	nullsFirst[2] = false;
+	nullsFirst[3] = false;
+
+	/* lookup sort operator for the boolean type (used for not_summarized) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_GT_OPR);
+	operators[0] = typcache->gt_opr;
+
+	/* lookup sort operator for the boolean type (used for all_nulls) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_LT_OPR);
+	operators[1] = typcache->lt_opr;
+
+	/* lookup sort operator for the min/max type */
+	typid = bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id;
+	typcache = lookup_type_cache(typid, TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+	operators[2] = (asc) ? typcache->lt_opr : typcache->gt_opr;
+
+	/* lookup sort operator for the bigint type (used for blkno_start) */
+	typcache = lookup_type_cache(INT4OID, TYPECACHE_LT_OPR);
+	operators[3] = typcache->lt_opr;
+
+	/*
+	 * XXX better to keep this small enough to fit into L2/L3, large values
+	 * of work_mem may easily make this slower.
+	 */
+	scan->ranges = tuplesort_begin_heap(scan->tdesc,
+										4, /* nkeys */
+										keys,
+										operators,
+										collations,
+										nullsFirst,
+										work_mem,
+										NULL,
+										TUPLESORT_RANDOMACCESS);
+
+	scan->slot = MakeSingleTupleTableSlot(scan->tdesc,
+										  &TTSOpsMinimalTuple);
+
+	return scan;
+}
+
+/*
+ * brin_minmax_scan_add_tuple
+ *		Form and store a tuple representing the BRIN range to the tuplestore.
+ */
+static void
+brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan, TupleTableSlot *slot,
+						   BlockNumber block_start, BlockNumber block_end,
+						   bool has_nulls, bool all_nulls, bool not_summarized,
+						   Datum min_value, Datum max_value)
+{
+	ExecClearTuple(slot);
+
+	memset(slot->tts_isnull, false, 7 * sizeof(bool));
+
+	slot->tts_values[0] = UInt32GetDatum(block_start);
+	slot->tts_values[1] = UInt32GetDatum(block_end);
+	slot->tts_values[2] = BoolGetDatum(has_nulls);
+	slot->tts_values[3] = BoolGetDatum(all_nulls);
+	slot->tts_values[4] = BoolGetDatum(not_summarized);
+	slot->tts_values[5] = min_value;
+	slot->tts_values[6] = max_value;
+
+	if (all_nulls || not_summarized)
+	{
+		slot->tts_isnull[5] = true;
+		slot->tts_isnull[6] = true;
+	}
+
+	ExecStoreVirtualTuple(slot);
+
+	tuplesort_puttupleslot(scan->ranges, slot);
+
+	scan->nranges++;
+}
+
+#ifdef BRINSORT_DEBUG
+/*
+ * brin_minmax_scan_next
+ *		Return the next BRIN range information from the tuplestore.
+ *
+ * Returns NULL when there are no more ranges.
+ */
+static BrinRange *
+brin_minmax_scan_next(BrinRangeScanDesc *scan)
+{
+	if (tuplesort_gettupleslot(scan->ranges, true, false, scan->slot, NULL))
+	{
+		bool		isnull;
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = slot_getattr(scan->slot, 1, &isnull);
+		range->blkno_end = slot_getattr(scan->slot, 2, &isnull);
+		range->has_nulls = slot_getattr(scan->slot, 3, &isnull);
+		range->all_nulls = slot_getattr(scan->slot, 4, &isnull);
+		range->not_summarized = slot_getattr(scan->slot, 5, &isnull);
+		range->min_value = slot_getattr(scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(scan->slot, 7, &isnull);
+
+		return range;
+	}
+
+	return NULL;
+}
+
+/*
+ * brin_minmax_scan_dump
+ *		Print info about all page ranges stored in the tuplestore.
+ */
+static void
+brin_minmax_scan_dump(BrinRangeScanDesc *scan)
+{
+	BrinRange *range;
+
+	if (!message_level_is_interesting(WARNING))
+		return;
+
+	elog(WARNING, "===== dumping =====");
+	while ((range = brin_minmax_scan_next(scan)) != NULL)
+	{
+		elog(WARNING, "[%u %u] has_nulls %d all_nulls %d not_summarized %d values [%ld %ld]",
+			 range->blkno_start, range->blkno_end,
+			 range->has_nulls, range->all_nulls, range->not_summarized,
+			 range->min_value, range->max_value);
+
+		pfree(range);
+	}
+
+	/* reset the tuplestore, so that we can start scanning again */
+	tuplesort_rescan(scan->ranges);
+}
+#endif
+
+static void
+brin_minmax_scan_finalize(BrinRangeScanDesc *scan)
+{
+	tuplesort_performsort(scan->ranges);
+}
+
+/*
+ * brin_minmax_ranges
+ *		Load the BRIN ranges and sort them.
+ */
+Datum
+brin_minmax_ranges(PG_FUNCTION_ARGS)
+{
+	IndexScanDesc	scan = (IndexScanDesc) PG_GETARG_POINTER(0);
+	AttrNumber		attnum = PG_GETARG_INT16(1);
+	bool			asc = PG_GETARG_BOOL(2);
+	Oid				colloid = PG_GET_COLLATION();
+	BrinOpaque *opaque;
+	Relation	indexRel;
+	Relation	heapRel;
+	BlockNumber nblocks;
+	BlockNumber	heapBlk;
+	Oid			heapOid;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinRangeScanDesc *brscan;
+	TupleTableSlot *slot;
+
+	/*
+	 * Determine how many BRIN ranges could there be, allocate space and read
+	 * all the min/max values.
+	 */
+	opaque = (BrinOpaque *) scan->opaque;
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	indexRel = bdesc->bd_index;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	heapOid = IndexGetRelation(RelationGetRelid(indexRel), false);
+	heapRel = table_open(heapOid, AccessShareLock);
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+	table_close(heapRel, AccessShareLock);
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* initialize the scan describing scan of ranges sorted by minval */
+	brscan = brin_minmax_scan_init(bdesc, colloid, attnum, asc);
+
+	slot = MakeSingleTupleTableSlot(brscan->tdesc, &TTSOpsVirtual);
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX The sort may be quite expensive, e.g. for small BRIN ranges. Maybe
+	 * we could optimize this somehow? For example, we know the not-summarized
+	 * ranges are always going to be first, and all-null ranges last, so maybe
+	 * we could stash those somewhere, and not sort them? But there are likely
+	 * only very few such ranges, in most cases. Moreover, how would we then
+	 * prepend/append those ranges to the sorted ones? Probably would have to
+	 * store them in a tuplestore, or something.
+	 *
+	 * XXX Seems that having large work_mem can be quite detrimental, because
+	 * then it overflows L2/L3 caches, making the sort much slower.
+	 *
+	 * XXX If there are other indexes, would be great to filter the ranges, so
+	 * that we only sort the interesting ones - reduces the number of ranges,
+	 * makes the sort faster.
+	 *
+	 * XXX Another option is making this incremental - e.g. only ask for the
+	 * first 1000 ranges, using a top-N sort. And then if it's not enough we
+	 * could request another chunk. But the second request would have to be
+	 * rather unlikely (because quite expensive), and the top-N sort does not
+	 * seem all that faster (as long as we don't overflow L2/L3).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   scan->xs_snapshot);
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/*
+		 * Ranges with no indexed tuple may contain anything.
+		 */
+		if (!gottuple)
+		{
+			brin_minmax_scan_add_tuple(brscan, slot,
+									   heapBlk, heapBlk + (pagesPerRange - 1),
+									   false, false, true, 0, 0);
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/*
+				 * Placeholder tuples are treated as if not summarized.
+				 *
+				 * XXX Is this correct?
+				 */
+				brin_minmax_scan_add_tuple(brscan, slot,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   false, false, true, 0, 0);
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				brin_minmax_scan_add_tuple(brscan, slot,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   bval->bv_hasnulls, bval->bv_allnulls, false,
+										   bval->bv_values[0], bval->bv_values[1]);
+			}
+		}
+	}
+
+	ExecDropSingleTupleTableSlot(slot);
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* do the sort and any necessary post-processing */
+	brin_minmax_scan_finalize(brscan);
+
+#ifdef BRINSORT_DEBUG
+	brin_minmax_scan_dump(brscan);
+#endif
+
+	PG_RETURN_POINTER(brscan);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e57bda7b62d..153e41b856f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -85,6 +85,8 @@ static void show_sort_keys(SortState *sortstate, List *ancestors,
 						   ExplainState *es);
 static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
+static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
+							   ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1100,6 +1102,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexScan:
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
+		case T_BrinSort:
 		case T_TidScan:
 		case T_TidRangeScan:
 		case T_SubqueryScan:
@@ -1262,6 +1265,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_IndexOnlyScan:
 			pname = sname = "Index Only Scan";
 			break;
+		case T_BrinSort:
+			pname = sname = "BRIN Sort";
+			break;
 		case T_BitmapIndexScan:
 			pname = sname = "Bitmap Index Scan";
 			break;
@@ -1508,6 +1514,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainScanTarget((Scan *) indexonlyscan, es);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *brinsort = (BrinSort *) plan;
+
+				ExplainIndexScanDetails(brinsort->indexid,
+										brinsort->indexorderdir,
+										es);
+				ExplainScanTarget((Scan *) brinsort, es);
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *bitmapindexscan = (BitmapIndexScan *) plan;
@@ -1790,6 +1806,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainPropertyFloat("Heap Fetches", NULL,
 									 planstate->instrument->ntuples2, 0, es);
 			break;
+		case T_BrinSort:
+			show_scan_qual(((BrinSort *) plan)->indexqualorig,
+						   "Index Cond", planstate, ancestors, es);
+			if (((BrinSort *) plan)->indexqualorig)
+				show_instrumentation_count("Rows Removed by Index Recheck", 2,
+										   planstate, es);
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			break;
 		case T_BitmapIndexScan:
 			show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
@@ -2389,6 +2417,21 @@ show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 						 ancestors, es);
 }
 
+/*
+ * Show the sort keys for a BRIN Sort node.
+ */
+static void
+show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSort	   *plan = (BrinSort *) sortstate->ss.ps.plan;
+
+	show_sort_group_keys((PlanState *) sortstate, "Sort Key",
+						 plan->numCols, 0, plan->sortColIdx,
+						 plan->sortOperators, plan->collations,
+						 plan->nullsFirst,
+						 ancestors, es);
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
@@ -3809,6 +3852,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
+		case T_BrinSort:
 			/* Assert it's on a real relation */
 			Assert(rte->rtekind == RTE_RELATION);
 			objectname = get_rel_name(rte->relid);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 11118d0ce02..bcaa2ce8e21 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -38,6 +38,7 @@ OBJS = \
 	nodeBitmapHeapscan.o \
 	nodeBitmapIndexscan.o \
 	nodeBitmapOr.o \
+	nodeBrinSort.o \
 	nodeCtescan.o \
 	nodeCustom.o \
 	nodeForeignscan.o \
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d41..93d10078091 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -79,6 +79,7 @@
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
+#include "executor/nodeBrinSort.h"
 #include "executor/nodeCtescan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -226,6 +227,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														 estate, eflags);
 			break;
 
+		case T_BrinSort:
+			result = (PlanState *) ExecInitBrinSort((BrinSort *) node,
+													estate, eflags);
+			break;
+
 		case T_BitmapIndexScan:
 			result = (PlanState *) ExecInitBitmapIndexScan((BitmapIndexScan *) node,
 														   estate, eflags);
@@ -639,6 +645,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIndexOnlyScan((IndexOnlyScanState *) node);
 			break;
 
+		case T_BrinSortState:
+			ExecEndBrinSort((BrinSortState *) node);
+			break;
+
 		case T_BitmapIndexScanState:
 			ExecEndBitmapIndexScan((BitmapIndexScanState *) node);
 			break;
diff --git a/src/backend/executor/meson.build b/src/backend/executor/meson.build
index 65f9457c9b1..ed7f38a1397 100644
--- a/src/backend/executor/meson.build
+++ b/src/backend/executor/meson.build
@@ -26,6 +26,7 @@ backend_sources += files(
   'nodeBitmapHeapscan.c',
   'nodeBitmapIndexscan.c',
   'nodeBitmapOr.c',
+  'nodeBrinSort.c',
   'nodeCtescan.c',
   'nodeCustom.c',
   'nodeForeignscan.c',
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
new file mode 100644
index 00000000000..9505eafc548
--- /dev/null
+++ b/src/backend/executor/nodeBrinSort.c
@@ -0,0 +1,1612 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.c
+ *	  Routines to support sorted scan of relations using a BRIN index
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * The overall algorithm is roughly this:
+ *
+ * 0) initialize a tuplestore and a tuplesort
+ *
+ * 1) fetch list of page ranges from the BRIN index, sorted by minval
+ *    (with the not-summarized ranges first, and all-null ranges last)
+ *
+ * 2) for NULLS FIRST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ * 3) while there are ranges in the list, do this:
+ *
+ *   a) get next (distinct) minval from the list, call it watermark
+ *
+ *   b) if there are any tuples in the tuplestore, move them to tuplesort
+ *
+ *   c) process all ranges with (minval < watermark) - read tuples and feed
+ *      them either into tuplestore (when value < watermark) or tuplestore
+ *
+ *   d) sort the tuplestore, output all the tuples
+ *
+ * 4) if some tuples remain in the tuplestore, sort and output them
+ *
+ * 5) for NULLS LAST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ *
+ * For DESC orderings the process is almost the same, except that we look
+ * at maxval and use '>' operator (but that's transparent).
+ *
+ * There's a couple possible things that might be done in different ways:
+ *
+ * 1) Not using tuplestore, and feeding tuples only to a tuplesort. Then
+ * while producing the tuples, we'd only output tuples up to the current
+ * watermark, and then we'd keep the remaining tuples for the next round.
+ * Either we'd need to transfer them into a second tuplesort, or allow
+ * "reopening" the tuplesort and adding more tuples. And then only the
+ * part since the watermark would get sorted (possibly using a merge-sort
+ * with the already sorted part).
+ *
+ *
+ * 2) The other question is what to do with NULL values - at the moment we
+ * just read the ranges, output the NULL tuples and that's it - we're not
+ * retaining any non-NULL tuples, so that we'll read the ranges again in
+ * the second range. The logic here is that either there are very few
+ * such ranges, so it's won't cost much to just re-read them. Or maybe
+ * there are very many such ranges, and we'd do a lot of spilling to the
+ * tuplestore, and it's not much more expensive to just re-read the source
+ * data. There are counter-examples, though - e.g., there might be many
+ * has_nulls ranges, but with very few non-NULL tuples. In this case it
+ * might be better to actually spill the tuples instead of re-reading all
+ * the ranges. Maybe this is something we can do at run-time, or maybe we
+ * could estimate this at planning time. We do know the null_frac for the
+ * column, so we know the number of NULL rows. And we also know the number
+ * of all_nulls and has_nulls ranges. We can estimate the number of rows
+ * per range, and we can estimate how many non-NULL rows are in the
+ * has_nulls ranges (we don't need to re-read all-nulls ranges). There's
+ * also the filter, which may reduce the amount of rows to store.
+ *
+ * So we'd need to compare two metrics calculated roughly like this:
+ *
+ *   cost(re-reading has-nulls ranges)
+ *      = cost(random_page_cost * n_has_nulls + seq_page_cost * pages_per_range)
+ *
+ *   cost(spilling non-NULL rows from has-nulls ranges)
+ *      = cost(numrows * width / BLCKSZ * seq_page_cost * 2)
+ *
+ * where numrows is the number of non-NULL rows in has_null ranges, which
+ * can be calculated like this:
+ *
+ *   // estimated number of rows in has-null ranges
+ *   rows_in_has_nulls = (reltuples / relpages) * pages_per_range * n_has_nulls
+ *
+ *   // number of NULL rows in the has-nulls ranges
+ *   nulls_in_ranges = reltuples * null_frac - n_all_nulls * (reltuples / relpages)
+ *
+ *   // numrows is the difference, multiplied by selectivity of the index
+ *   // filter condition (value between 0.0 and 1.0)
+ *   numrows = (rows_in_has_nulls - nulls_in_ranges) * selectivity
+ *
+ * This ignores non-summarized ranges, but there should be only very few of
+ * those, so it should not make a huge difference. Otherwise we can divide
+ * them between regular, has-nulls and all-nulls pages to keep the ratio.
+ *
+ *
+ * 3) How large step to make when updating the watermark?
+ *
+ * When updating the watermark, one option is to simply proceed to the next
+ * distinct minval value, which is the smallest possible step we can make.
+ * This may be both fine and very inefficient, depending on how many rows
+ * end up in the tuplesort and how many rows we end up spilling (possibly
+ * repeatedly to the tuplestore).
+ *
+ * When having to sort large number of rows, it's inefficient to run many
+ * tiny sorts, even if it produces correct result. For example when sorting
+ * 1M rows, we may split this as either (a) 100000x sorts of 10 rows, or
+ * (b) 1000 sorts of 1000 rows. The (b) option is almost certainly more
+ * efficient. Maybe sorts of 10k rows would be even better, if it fits
+ * into work_mem.
+ *
+ * This gets back to how large the page ranges are, and if/how much they
+ * overlap. With tiny ranges (e.g. a single-page ranges), a single range
+ * can only add as many rows as we can fit on a single page. So we need
+ * more ranges by default - how many watermark steps that is depends on
+ * how many distinct minval values are there ...
+ *
+ * Then there's overlaps - if ranges do not overlap, we're done and we'll
+ * add the whole range because the next watermark is above maxval. But
+ * when the ranges overlap, we'll only add the first part (assuming the
+ * minval of the next range is the watermark). Assume 10 overlapping
+ * ranges - imagine for example ranges shifted by 10%, so something like
+ *
+ *   [0,100] [10,110], [20,120], [30, 130], ..., [90, 190]
+ *
+ * In the first step we use watermark=10 and load the first range, with
+ * maybe 1000 rows in total. But assuming uniform distribution, only about
+ * 100 rows will go into the tuplesort, the remaining 900 rows will go into
+ * the tuplestore (assuming uniform distribution). Then in the second step
+ * we sort another 100 rows and the remaining 800 rows will be moved into
+ * a new tuplestore. And so on and so on.
+ *
+ * This means that incrementing the watermarks by single steps may be
+ * quite inefficient, and we need to reflect both the range size and
+ * how much the ranges overlap.
+ *
+ * In fact, maybe we should not determine the step as number of minval
+ * values to skip, but how many ranges would that mean reading. Because
+ * if we have a minval with many duplicates, that may load many rows.
+ * Or even better, we could look at how many rows would that mean loading
+ * into the tuplestore - if we track P(x<minval) for each range (e.g. by
+ * calculating average value during ANALYZE, or perhaps by estimating
+ * it from per-column stats), then we know the increment is going to be
+ * about
+ *
+ *     P(x < minval[i]) - P(x < minval[i-1])
+ *
+ * and we can stop once we'd exceed work_mem (with some slack). See comment
+ * for brin_minmax_stats() for more thoughts.
+ *
+ *
+ * 4) LIMIT/OFFSET vs. full sort
+ *
+ * There's one case where very small sorts may be actually optimal, and
+ * that's queries that need to process only very few rows - say, LIMIT
+ * queries with very small bound.
+ *
+ *
+ * FIXME handling of other brin opclasses (minmax-multi)
+ *
+ * FIXME improve costing
+ *
+ *
+ * Improvement ideas:
+ *
+ * 1) multiple tuplestores for overlapping ranges
+ *
+ * When there are many overlapping ranges (so that maxval > current.maxval),
+ * we're loading all the "future" tuples into a new tuplestore. However, if
+ * there are multiple such ranges (imagine ranges "shifting" by 10%, which
+ * gives us 9 more ranges), we know in the next round we'll only need rows
+ * until the next maxval. We'll not sort these rows, but we'll still shuffle
+ * them around until we get to the proper range (so about 10x each row).
+ * Maybe we should pre-allocate the tuplestores (or maybe even tuplesorts)
+ * for future ranges, and route the tuples to the correct one? Maybe we
+ * could be a bit smarter and discard tuples once we have enough rows for
+ * the preceding ranges (say, with LIMIT queries). We'd also need to worry
+ * about work_mem, though - we can't just use many tuplestores, each with
+ * whole work_mem. So we'd probably use e.g. work_mem/2 for the next one,
+ * and then /4, /8 etc. for the following ones. That's work_mem in total.
+ * And there'd need to be some limit on number of tuplestores, I guess.
+ *
+ * 2) handling NULL values
+ *
+ * We need to handle NULLS FIRST / NULLS LAST cases. The question is how
+ * to do that - the easiest way is to simply do a separate scan of ranges
+ * that might contain NULL values, processing just rows with NULLs, and
+ * discarding other rows. And then process non-NULL values as currently.
+ * The NULL scan would happen before/after this regular phase.
+ *
+ * Byt maybe we could be smarter, and not do separate scans. When reading
+ * a page, we might stash the tuple in a tuplestore, so that we can read
+ * it the next round. Obviously, this might be expensive if we need to
+ * keep too many rows, so the tuplestore would grow too large - in that
+ * case it might be better to just do the two scans.
+ *
+ * 3) parallelism
+ *
+ * Presumably we could do a parallel version of this. The leader or first
+ * worker would prepare the range information, and the workers would then
+ * grab ranges (in a kinda round robin manner), sort them independently,
+ * and then the results would be merged by Gather Merge.
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeBrinSort.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *		ExecBrinSort			scans a relation using an index
+ *		IndexNext				retrieve next tuple using index
+ *		ExecInitBrinSort		creates and initializes state info.
+ *		ExecReScanBrinSort		rescans the indexed relation.
+ *		ExecEndBrinSort			releases all storage.
+ *		ExecBrinSortMarkPos		marks scan position.
+ *		ExecBrinSortRestrPos	restores scan position.
+ *		ExecBrinSortEstimate	estimates DSM space needed for parallel index scan
+ *		ExecBrinSortInitializeDSM initialize DSM for parallel BrinSort
+ *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
+ *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
+ */
+#include "postgres.h"
+
+#include "access/brin.h"
+#include "access/brin_internal.h"
+#include "access/nbtree.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
+#include "executor/execdebug.h"
+#include "executor/nodeBrinSort.h"
+#include "lib/pairingheap.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/array.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+
+static TupleTableSlot *IndexNext(BrinSortState *node);
+static bool IndexRecheck(BrinSortState *node, TupleTableSlot *slot);
+static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
+
+#ifdef DEBUG_BRIN_SORT
+bool debug_brin_sort = false;
+#endif
+
+/* do various consistency checks */
+static void
+AssertCheckRanges(BrinSortState *node)
+{
+#ifdef USE_ASSERT_CHECKING
+
+#endif
+}
+
+/*
+ * brinsort_start_tidscan
+ *		Start scanning tuples from a given page range.
+ *
+ * We open a TID range scan for the given range, and initialize the tuplesort.
+ * Optionally, we update the watermark (with either high/low value). We only
+ * need to do this for the main page range, not for the intersecting ranges.
+ *
+ * XXX Maybe we should initialize the tidscan only once, and then do rescan
+ * for the following ranges? And similarly for the tuplesort?
+ */
+static void
+brinsort_start_tidscan(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate = node->ss.ps.state;
+	BrinRange  *range = node->bs_range;
+
+	/* There must not be any TID scan in progress yet. */
+	Assert(node->ss.ss_currentScanDesc == NULL);
+
+	/* Initialize the TID range scan, for the provided block range. */
+	if (node->ss.ss_currentScanDesc == NULL)
+	{
+		TableScanDesc		tscandesc;
+		ItemPointerData		mintid,
+							maxtid;
+
+		ItemPointerSetBlockNumber(&mintid, range->blkno_start);
+		ItemPointerSetOffsetNumber(&mintid, 0);
+
+		ItemPointerSetBlockNumber(&maxtid, range->blkno_end);
+		ItemPointerSetOffsetNumber(&maxtid, MaxHeapTuplesPerPage);
+
+		elog(DEBUG1, "loading range blocks [%u, %u]",
+			 range->blkno_start, range->blkno_end);
+
+		tscandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 &mintid, &maxtid);
+		node->ss.ss_currentScanDesc = tscandesc;
+	}
+
+	if (node->bs_tuplesortstate == NULL)
+	{
+		TupleDesc	tupDesc = (node->ss.ps.ps_ResultTupleDesc);
+
+		node->bs_tuplesortstate = tuplesort_begin_heap(tupDesc,
+													plan->numCols,
+													plan->sortColIdx,
+													plan->sortOperators,
+													plan->collations,
+													plan->nullsFirst,
+													work_mem,
+													NULL,
+													TUPLESORT_NONE);
+	}
+
+	if (node->bs_tuplestore == NULL)
+	{
+		node->bs_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+	}
+}
+
+/*
+ * brinsort_end_tidscan
+ *		Finish the TID range scan.
+ */
+static void
+brinsort_end_tidscan(BrinSortState *node)
+{
+	/* get the first range, read all tuples using a tid range scan */
+	if (node->ss.ss_currentScanDesc != NULL)
+	{
+		table_endscan(node->ss.ss_currentScanDesc);
+		node->ss.ss_currentScanDesc = NULL;
+	}
+}
+
+/*
+ * brinsort_update_watermark
+ *		Advance the watermark to the next minval (or maxval for DESC).
+ *
+ * We could could actually advance the watermark by multiple steps (not to
+ * the immediately following minval, but a couple more), to accumulate more
+ * rows in the tuplesort. The number of steps we make correlates with the
+ * amount of data we sort in a given step, but we don't know in advance
+ * how many rows (or bytes) will that actually be. We could do some simple
+ * heuristics (measure past sorts and extrapolate).
+ *
+ * XXX With a separate _set and _empty flags, we don't really need to pass
+ * a separate "first" parameter - "set=false" has the same meaning.
+ */
+static void
+brinsort_update_watermark(BrinSortState *node, bool asc)
+{
+	int		cmp;
+	bool	found = false;
+
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool	isnull;
+		Datum	value;
+		bool	all_nulls;
+		bool	not_summarized;
+
+		all_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 4, &isnull));
+		Assert(!isnull);
+
+		not_summarized = DatumGetBool(slot_getattr(node->bs_scan->slot, 5, &isnull));
+		Assert(!isnull);
+
+		/* we ignore ranges that are either all_nulls or not summarized */
+		if (all_nulls || not_summarized)
+			continue;
+
+		/* use either minval or maxval, depending on the ASC / DESC */
+		if (asc)
+			value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		else
+			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		if (!node->bs_watermark_set)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			continue;
+		}
+
+		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
+								  &node->bs_sortsupport);
+
+		if (cmp < 0)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			found = true;
+			break;
+		}
+	}
+
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	node->bs_watermark_empty = (!found);
+}
+
+/*
+ * brinsort_load_tuples
+ *		Load tuples from the TID range scan, add them to tuplesort/store.
+ *
+ * When called for the "current" range, we don't need to check the watermark,
+ * we know the tuple goes into the tuplesort. So with check_watermark we
+ * skip the comparator call to save CPU cost.
+ */
+static void
+brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_processing)
+{
+	BrinSort	   *plan = (BrinSort *) node->ss.ps.plan;
+	TableScanDesc	scan;
+	EState		   *estate;
+	ScanDirection	direction;
+	TupleTableSlot *slot;
+	BrinRange	   *range = node->bs_range;
+	ProjectionInfo *projInfo;
+
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	projInfo = node->bs_ProjInfo;
+
+	slot = node->ss.ss_ScanTupleSlot;
+
+	Assert(node->bs_range != NULL);
+
+	/*
+	 * If we're not processign NULLS, and this is all-nulls range, we can
+	 * just skip it - we won't find any non-NULL tuples in it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (!null_processing && range->all_nulls)
+		return;
+
+	/*
+	 * Similarly, if we're processing NULLs and this range does not have
+	 * has_nulls flag, we can skip it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
+		return;
+
+	brinsort_start_tidscan(node);
+
+	scan = node->ss.ss_currentScanDesc;
+
+	/*
+	 * Read tuples, evaluate the filter (so that we don't keep tuples only to
+	 * discard them later), and decide if it goes into the current range
+	 * (tuplesort) or overflow (tuplestore).
+	 */
+	while (table_scan_getnextslot_tidrange(scan, direction, slot))
+	{
+		ExprContext *econtext;
+		ExprState  *qual;
+
+		/*
+		 * Fetch data from node
+		 */
+		qual = node->bs_qual;
+		econtext = node->ss.ps.ps_ExprContext;
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 *
+		 * XXX Done here, because in ExecScan we'll get different slot type
+		 * (minimal tuple vs. buffered tuple). Scan expects slot while reading
+		 * from the table (like here), but we're stashing it into a tuplesort.
+		 *
+		 * XXX Maybe we could eliminate many tuples by leveraging the BRIN
+		 * range, by executing the consistent function. But we don't have
+		 * the qual in appropriate format at the moment, so we'd preprocess
+		 * the keys similarly to bringetbitmap(). In which case we should
+		 * probably evaluate the stuff while building the ranges? Although,
+		 * if the "consistent" function is expensive, it might be cheaper
+		 * to do that incrementally, as we need the ranges. Would be a win
+		 * for LIMIT queries, for example.
+		 *
+		 * XXX However, maybe we could also leverage other bitmap indexes,
+		 * particularly for BRIN indexes because that makes it simpler to
+		 * eliminate the ranges incrementally - we know which ranges to
+		 * load from the index, while for other indexes (e.g. btree) we
+		 * have to read the whole index and build a bitmap in order to have
+		 * a bitmap for any range. Although, if the condition is very
+		 * selective, we may need to read only a small fraction of the
+		 * index, so maybe that's OK.
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			int		cmp = 0;	/* matters for check_watermark=false */
+			Datum	value;
+			bool	isnull;
+			TupleTableSlot *tmpslot;
+
+			if (projInfo)
+				tmpslot = ExecProject(projInfo);
+			else
+				tmpslot = slot;
+
+			value = slot_getattr(tmpslot, plan->sortColIdx[0], &isnull);
+
+			/*
+			 * Handle NULL values - stash them into the tuplestore, and then
+			 * we'll output them in "process" stage.
+			 *
+			 * XXX Can we be a bit smarter for LIMIT queries and stop reading
+			 * rows once we get the number we need to produce? Probably not,
+			 * because the ordering may reference other columns (which we may
+			 * satisfy through IncrementalSort). But all NULL columns are
+			 * considered equal, so we need all the rows to properly compare
+			 * the other keys.
+			 */
+			if (null_processing)
+			{
+				/* Stash it to the tuplestore (when NULL, or ignore
+				 * it (when not-NULL). */
+				if (isnull)
+					tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+
+				/* NULL or not, we're done */
+				continue;
+			}
+
+			/* we're not processing NULL values, so ignore NULLs */
+			if (isnull)
+				continue;
+
+			/*
+			 * Otherwise compare to watermark, and stash it either to the
+			 * tuplesort or tuplestore.
+			 */
+			if (check_watermark && node->bs_watermark_set && !node->bs_watermark_empty)
+				cmp = ApplySortComparator(value, false,
+										  node->bs_watermark, false,
+										  &node->bs_sortsupport);
+
+			if (cmp <= 0)
+				tuplesort_puttupleslot(node->bs_tuplesortstate, tmpslot);
+			else
+			{
+				/*
+				 * XXX We can be a bit smarter for LIMIT queries - once we
+				 * know we have more rows in the tuplesort than we need to
+				 * output, we can stop spilling - those rows are not going
+				 * to be needed. We can discard the tuplesort (no need to
+				 * respill) and stop spilling.
+				 */
+				tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+			}
+		}
+
+		ExecClearTuple(slot);
+	}
+
+	ExecClearTuple(slot);
+
+	brinsort_end_tidscan(node);
+}
+
+/*
+ * brinsort_load_spill_tuples
+ *		Load tuples from the spill tuplestore, and either stash them into
+ *		a tuplesort or a new tuplestore.
+ *
+ * After processing the last range, we want to process all remaining ranges,
+ * so with check_watermark=false we skip the check.
+ */
+static void
+brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	Tuplestorestate *tupstore;
+	TupleTableSlot *slot;
+	ProjectionInfo *projInfo;
+
+	projInfo = node->bs_ProjInfo;
+
+	if (node->bs_tuplestore == NULL)
+		return;
+
+	/* start scanning the existing tuplestore (XXX needed?) */
+	tuplestore_rescan(node->bs_tuplestore);
+
+	/*
+	 * Create a new tuplestore, for tuples that exceed the watermark and so
+	 * should not be included in the current sort.
+	 */
+	tupstore = tuplestore_begin_heap(false, false, work_mem);
+
+	/*
+	 * We need a slot for minimal tuples. The scan slot uses buffered tuples,
+	 * so it'd trigger an error in the loop.
+	 */
+	if (projInfo)
+		slot = node->ss.ps.ps_ResultTupleSlot;
+	else
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(node->ss.ss_currentRelation),
+									&TTSOpsMinimalTuple);
+
+	while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+	{
+		int		cmp = 0;	/* matters for check_watermark=false */
+		bool	isnull;
+		Datum	value;
+
+		value = slot_getattr(slot, plan->sortColIdx[0], &isnull);
+
+		/* We shouldn't have NULL values in the spill, at least not now. */
+		Assert(!isnull);
+
+		if (check_watermark && node->bs_watermark_set && !node->bs_watermark_empty)
+			cmp = ApplySortComparator(value, false,
+									  node->bs_watermark, false,
+									  &node->bs_sortsupport);
+
+		if (cmp <= 0)
+			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+		else
+		{
+			/*
+			 * XXX We can be a bit smarter for LIMIT queries - once we
+			 * know we have more rows in the tuplesort than we need to
+			 * output, we can stop spilling - those rows are not going
+			 * to be needed. We can discard the tuplesort (no need to
+			 * respill) and stop spilling.
+			 */
+			tuplestore_puttupleslot(tupstore, slot);
+		}
+	}
+
+	/*
+	 * Discard the existing tuplestore (that we just processed), use the new
+	 * one instead.
+	 */
+	tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = tupstore;
+
+	if (!projInfo)
+		ExecDropSingleTupleTableSlot(slot);
+}
+
+static bool
+brinsort_next_range(BrinSortState *node, bool asc)
+{
+	/* FIXME free the current bs_range, if any */
+	node->bs_range = NULL;
+
+	/*
+	 * Mark the position, so that we can restore it in case we reach the
+	 * current watermark.
+	 */
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	/*
+	 * Get the next range and return it, unless we can prove it's the last
+	 * range that can possibly match the current conditon (thanks to how we
+	 * order the ranges).
+	 *
+	 * Also skip ranges that can't possibly match (e.g. because we are in
+	 * NULL processing, and the range has no NULLs).
+	 */
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool		isnull;
+		Datum		value;
+
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = DatumGetUInt32(slot_getattr(node->bs_scan->slot, 1, &isnull));
+		range->blkno_end = DatumGetUInt32(slot_getattr(node->bs_scan->slot, 2, &isnull));
+		range->has_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 3, &isnull));
+		range->all_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 4, &isnull));
+		range->not_summarized = DatumGetBool(slot_getattr(node->bs_scan->slot, 5, &isnull));
+		range->min_value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		/*
+		 * Not-summarized ranges match irrespectedly of the watermark (if
+		 * it's set at all).
+		 */
+		if (range->not_summarized)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * The range is summarized, but maybe the watermark is not? That
+		 * would mean we're processing NULL values, so we skip ranges that
+		 * can't possibly match (i.e. with all_nulls=has_nulls=false).
+		 */
+		if (!node->bs_watermark_set)
+		{
+			if (range->all_nulls || range->has_nulls)
+			{
+				node->bs_range = range;
+				return true;
+			}
+
+			/* update the position and try the next range */
+			tuplesort_markpos(node->bs_scan->ranges);
+			pfree(range);
+
+			continue;
+		}
+
+		/*
+		 * Watermark is set, but it's empty - everything matches (except
+		 * for NULL-only ranges, because we're definitely not processing
+		 * NULLS, because then we wouldn't have watermark set).
+		 */
+		if (node->bs_watermark_empty)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * So now we have a summarized range, and we know the watermark
+		 * is set too (so we're not processing NULLs). We place the ranges
+		 * with only nulls last, so once we hit one we're done.
+		 */
+		if (range->all_nulls)
+		{
+			pfree(range);
+			return false;	/* no more matching ranges */
+		}
+
+		/*
+		 * Compare the range to the watermark, using either the minval or
+		 * maxval, depending on ASC/DESC ordering. If the range precedes the
+		 * watermark, return it. Otherwise abort, all the future ranges are
+		 * either not matching the watermark (thanks to ordering) or contain
+		 * only NULL values.
+		 */
+
+		/* use minval or maxval, depending on ASC / DESC */
+		value = (asc) ? range->min_value : range->max_value;
+
+		/*
+		 * compare it to the current watermark (if set)
+		 *
+		 * XXX We don't use (... <= 0) here, because then we'd load ranges
+		 * with that minval (and there might be multiple), but most of the
+		 * rows would go into the tuplestore, because only rows matching the
+		 * minval exactly would be loaded into tuplesort.
+		 */
+		if (ApplySortComparator(value, false,
+								 node->bs_watermark, false,
+								 &node->bs_sortsupport) < 0)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		pfree(range);
+		break;
+	}
+
+	/* not a matching range, we're done */
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	return false;
+}
+
+static bool
+brinsort_range_with_nulls(BrinSortState *node)
+{
+	BrinRange *range = node->bs_range;
+
+	if (range->all_nulls || range->has_nulls || range->not_summarized)
+		return true;
+
+	return false;
+}
+
+static void
+brinsort_rescan(BrinSortState *node)
+{
+	tuplesort_rescan(node->bs_scan->ranges);
+}
+
+/* ----------------------------------------------------------------
+ *		IndexNext
+ *
+ *		Retrieve a tuple from the BrinSort node's currentRelation
+ *		using the index specified in the BrinSortState information.
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+IndexNext(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate;
+	ScanDirection direction;
+	IndexScanDesc scandesc;
+	TupleTableSlot *slot;
+	bool		nullsFirst;
+	bool		asc;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+
+	/* flip direction if this is an overall backward scan */
+	/* XXX For BRIN indexes this is always forward direction */
+	// if (ScanDirectionIsBackward(((BrinSort *) node->ss.ps.plan)->indexorderdir))
+	if (false)
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	scandesc = node->iss_ScanDesc;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	nullsFirst = plan->nullsFirst[0];
+	asc = ScanDirectionIsForward(plan->indexorderdir);
+
+	if (scandesc == NULL)
+	{
+		/*
+		 * We reach here if the index scan is not parallel, or if we're
+		 * serially executing an index scan that was planned to be parallel.
+		 */
+		scandesc = index_beginscan(node->ss.ss_currentRelation,
+								   node->iss_RelationDesc,
+								   estate->es_snapshot,
+								   node->iss_NumScanKeys,
+								   node->iss_NumOrderByKeys);
+
+		node->iss_ScanDesc = scandesc;
+
+		/*
+		 * If no run-time keys to calculate or they are ready, go ahead and
+		 * pass the scankeys to the index AM.
+		 */
+		if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+			index_rescan(scandesc,
+						 node->iss_ScanKeys, node->iss_NumScanKeys,
+						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+
+		/*
+		 * Load info about BRIN ranges, sort them to match the desired ordering.
+		 */
+		ExecInitBrinSortRanges(plan, node);
+		node->bs_phase = BRINSORT_START;
+	}
+
+	/*
+	 * ok, now that we have what we need, fetch the next tuple.
+	 */
+	while (node->bs_phase != BRINSORT_FINISHED)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		elog(DEBUG1, "phase = %d", node->bs_phase);
+
+		AssertCheckRanges(node);
+
+		switch (node->bs_phase)
+		{
+			case BRINSORT_START:
+
+				elog(DEBUG1, "phase = START");
+
+				/*
+				 * If we have NULLS FIRST, move to that stage. Otherwise
+				 * start scanning regular ranges.
+				 */
+				if (nullsFirst)
+					node->bs_phase = BRINSORT_LOAD_NULLS;
+				else
+				{
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+
+					/* set the first watermark */
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_RANGE:
+				{
+					elog(DEBUG1, "phase = LOAD_RANGE");
+
+					/*
+					 * Load tuples matching the new watermark from the existing
+					 * spill tuplestore. We do this before loading tuples from
+					 * the next chunk of ranges, because those will add tuples
+					 * to the spill, and we'd end up processing those twice.
+					 */
+					brinsort_load_spill_tuples(node, true);
+
+					/*
+					 * Load tuples from ranges, until we find a range that has
+					 * min_value >= watermark.
+					 *
+					 * XXX In fact, we are guaranteed to find an exact match
+					 * for the watermark, because of how we pick the watermark.
+					 */
+					while (brinsort_next_range(node, asc))
+						brinsort_load_tuples(node, true, false);
+
+					/*
+					 * If we have loaded any tuples into the tuplesort, try
+					 * sorting it and move to producing the tuples.
+					 *
+					 * XXX The range might have no rows matching the current
+					 * watermark, in which case the tuplesort is empty.
+					 */
+					if (node->bs_tuplesortstate)
+					{
+#ifdef DEBUG_BRIN_SORT
+						tuplesort_reset_stats(node->bs_tuplesortstate);
+#endif
+
+						tuplesort_performsort(node->bs_tuplesortstate);
+
+#ifdef DEBUG_BRIN_SORT
+						if (debug_brin_sort)
+						{
+							TuplesortInstrumentation stats;
+
+							memset(&stats, 0, sizeof(TuplesortInstrumentation));
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							elog(WARNING, "method: %s  space: %ld kB (%s)",
+								 tuplesort_method_name(stats.sortMethod),
+								 stats.spaceUsed,
+								 tuplesort_space_type_name(stats.spaceType));
+						}
+#endif
+					}
+
+					node->bs_phase = BRINSORT_PROCESS_RANGE;
+					break;
+				}
+
+			case BRINSORT_PROCESS_RANGE:
+
+				elog(DEBUG1, "phase BRINSORT_PROCESS_RANGE");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplesortstate != NULL)
+				{
+					if (tuplesort_gettupleslot(node->bs_tuplesortstate,
+										ScanDirectionIsForward(direction),
+										false, slot, NULL))
+						return slot;
+
+					/* once we're done with the tuplesort, reset it */
+					tuplesort_reset(node->bs_tuplesortstate);
+				}
+
+				/*
+				 * Now that we processed tuples from the last range batch,
+				 * see if we reached the end of if we should try updating
+				 * the watermark once again. If the watermark is not set,
+				 * we've already processed the last range.
+				 */
+				if (node->bs_watermark_empty)
+				{
+					if (nullsFirst)
+						node->bs_phase = BRINSORT_FINISHED;
+					else
+					{
+						brinsort_rescan(node);
+						node->bs_phase = BRINSORT_LOAD_NULLS;
+						node->bs_watermark_set = false;
+						node->bs_watermark_empty = false;
+					}
+				}
+				else
+				{
+					/* updte the watermark and try reading more ranges */
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_NULLS:
+				{
+					elog(DEBUG1, "phase = LOAD_NULLS");
+
+					/*
+					 * Try loading another range. If there are no more ranges,
+					 * we're done and we move either to loading regular ranges.
+					 * Otherwise check if this range can contain NULL values.
+					 * If yes, process the range. If not, try loading another
+					 * one from the list.
+					 */
+					while (true)
+					{
+						/* no more ranges - terminate or load regular ranges */
+						if (!brinsort_next_range(node, asc))
+						{
+							if (nullsFirst)
+							{
+								brinsort_rescan(node);
+								node->bs_phase = BRINSORT_LOAD_RANGE;
+								brinsort_update_watermark(node, asc);
+							}
+							else
+								node->bs_phase = BRINSORT_FINISHED;
+
+							break;
+						}
+
+						/* If this range (may) have nulls, proces them */
+						if (brinsort_range_with_nulls(node))
+							break;
+					}
+
+					if (node->bs_range == NULL)
+						break;
+
+					/*
+					 * There should be nothing left in the tuplestore, because
+					 * we flush that at the end of processing regular tuples,
+					 * and we don't retain tuples between NULL ranges.
+					 */
+					// Assert(node->bs_tuplestore == NULL);
+
+					/*
+					 * Load the next unprocessed / NULL range. We don't need to
+					 * check watermark while processing NULLS.
+					 */
+					brinsort_load_tuples(node, false, true);
+
+					node->bs_phase = BRINSORT_PROCESS_NULLS;
+					break;
+				}
+
+				break;
+
+			case BRINSORT_PROCESS_NULLS:
+
+				elog(DEBUG1, "phase = LOAD_NULLS");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				Assert(node->bs_tuplestore != NULL);
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplestore != NULL)
+				{
+
+					while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+						return slot;
+
+					tuplestore_end(node->bs_tuplestore);
+					node->bs_tuplestore = NULL;
+
+					node->bs_phase = BRINSORT_LOAD_NULLS;	/* load next range */
+				}
+
+				break;
+
+			case BRINSORT_FINISHED:
+				elog(ERROR, "unexpected BrinSort phase: FINISHED");
+				break;
+		}
+	}
+
+	/*
+	 * if we get here it means the index scan failed so we are at the end of
+	 * the scan..
+	 */
+	node->iss_ReachedEnd = true;
+	return ExecClearTuple(slot);
+}
+
+/*
+ * IndexRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+IndexRecheck(BrinSortState *node, TupleTableSlot *slot)
+{
+	ExprContext *econtext;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	econtext = node->ss.ps.ps_ExprContext;
+
+	/* Does the tuple meet the indexqual condition? */
+	econtext->ecxt_scantuple = slot;
+	return ExecQualAndReset(node->indexqualorig, econtext);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSort(node)
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecBrinSort(PlanState *pstate)
+{
+	BrinSortState *node = castNode(BrinSortState, pstate);
+
+	/*
+	 * If we have runtime keys and they've not already been set up, do it now.
+	 */
+	if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady)
+		ExecReScan((PlanState *) node);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) IndexNext,
+					(ExecScanRecheckMtd) IndexRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanBrinSort(node)
+ *
+ *		Recalculates the values of any scan keys whose value depends on
+ *		information known at runtime, then rescans the indexed relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanBrinSort(BrinSortState *node)
+{
+	/*
+	 * If we are doing runtime key calculations (ie, any of the index key
+	 * values weren't simple Consts), compute the new key values.  But first,
+	 * reset the context so we don't leak memory as each outer tuple is
+	 * scanned.  Note this assumes that we will recalculate *all* runtime keys
+	 * on each call.
+	 */
+	if (node->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *econtext = node->iss_RuntimeContext;
+
+		ResetExprContext(econtext);
+		ExecIndexEvalRuntimeKeys(econtext,
+								 node->iss_RuntimeKeys,
+								 node->iss_NumRuntimeKeys);
+	}
+	node->iss_RuntimeKeysReady = true;
+
+	/* reset index scan */
+	if (node->iss_ScanDesc)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+	node->iss_ReachedEnd = false;
+
+	ExecScanReScan(&node->ss);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecEndBrinSort
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndBrinSort(BrinSortState *node)
+{
+	Relation	indexRelationDesc;
+	IndexScanDesc IndexScanDesc;
+
+	/*
+	 * extract information from the node
+	 */
+	indexRelationDesc = node->iss_RelationDesc;
+	IndexScanDesc = node->iss_ScanDesc;
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/*
+	 * close the index relation (no-op if we didn't open it)
+	 */
+	if (IndexScanDesc)
+		index_endscan(IndexScanDesc);
+	if (indexRelationDesc)
+		index_close(indexRelationDesc, NoLock);
+
+	if (node->ss.ss_currentScanDesc != NULL)
+		table_endscan(node->ss.ss_currentScanDesc);
+
+	if (node->bs_tuplestore != NULL)
+		tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = NULL;
+
+	if (node->bs_tuplesortstate != NULL)
+		tuplesort_end(node->bs_tuplesortstate);
+	node->bs_tuplesortstate = NULL;
+
+	if (node->bs_scan->ranges != NULL)
+		tuplesort_end(node->bs_scan->ranges);
+	node->bs_scan->ranges = NULL;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortMarkPos
+ *
+ * Note: we assume that no caller attempts to set a mark before having read
+ * at least one tuple.  Otherwise, iss_ScanDesc might still be NULL.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortMarkPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  If a test tuple exists for
+		 * this relation, then we shouldn't access the index at all.  We would
+		 * instead need to save, and later restore, the state of the
+		 * relsubs_done flag, so that re-fetching the test tuple is possible.
+		 * However, given the assumption that no caller sets a mark at the
+		 * start of the scan, we can only get here with relsubs_done[i]
+		 * already set, and so no state need be saved.
+		 */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortMarkPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_markpos(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecIndexRestrPos
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortRestrPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (estate->es_epq_active != NULL)
+	{
+		/* See comments in ExecIndexMarkPos */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortRestrPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_restrpos(node->iss_ScanDesc);
+}
+
+/*
+ * somewhat crippled verson of bringetbitmap
+ *
+ * XXX We don't call consistent function (or any other function), so unlike
+ * bringetbitmap we don't set a separate memory context. If we end up filtering
+ * the ranges somehow (e.g. by WHERE conditions), this might be necessary.
+ *
+ * XXX Should be part of opclass, to somewhere in brin_minmax.c etc.
+ */
+static void
+ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
+{
+	IndexScanDesc	scan = planstate->iss_ScanDesc;
+	Relation	indexRel = planstate->iss_RelationDesc;
+	int			attno;
+	FmgrInfo   *rangeproc;
+	BrinRangeScanDesc *brscan;
+	bool		asc;
+
+	/* BRIN Sort only allows ORDER BY using a single column */
+	Assert(node->numCols == 1);
+
+	attno = node->attnums[0];
+
+	/*
+	 * Make sure we matched the sort key - if not, we should not have got
+	 * to this place at all (try sorting using this index).
+	 */
+	Assert(AttrNumberIsForUserDefinedAttr(attno));
+
+	/*
+	 * get procedure to generate sort ranges
+	 *
+	 * FIXME we can't rely on a particular procnum to identify which opclass
+	 * allows building sort ranges, because the optinal procnums are not
+	 * unique (e.g. inclusion_ops have 12 too). So we probably need a flag
+	 * for the opclass.
+	 */
+	rangeproc = index_getprocinfo(indexRel, attno, BRIN_PROCNUM_RANGES);
+
+	/*
+	 * Should not get here without a proc, thanks to the check before
+	 * building the BrinSort path.
+	 */
+	Assert(OidIsValid(rangeproc->fn_oid));
+
+	memset(&planstate->bs_sortsupport, 0, sizeof(SortSupportData));
+
+	planstate->bs_sortsupport.ssup_collation = node->collations[0];
+	planstate->bs_sortsupport.ssup_cxt = CurrentMemoryContext; // FIXME
+
+	PrepareSortSupportFromOrderingOp(node->sortOperators[0], &planstate->bs_sortsupport);
+
+	/*
+	 * Determine if this ASC or DESC sort, so that we can request the
+	 * ranges in the appropriate order (ordered either by minval for
+	 * ASC, or by maxval for DESC).
+	 */
+	asc = ScanDirectionIsForward(node->indexorderdir);
+
+	/*
+	 * Ask the opclass to produce ranges in appropriate ordering.
+	 *
+	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
+	 */
+	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
+											node->collations[0],
+											PointerGetDatum(scan),
+											Int16GetDatum(attno),
+											BoolGetDatum(asc)));
+
+	/* allocate for space, and also for the alternative ordering */
+	planstate->bs_scan = brscan;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitBrinSort
+ *
+ *		Initializes the index scan's state information, creates
+ *		scan keys, and opens the base and index relations.
+ *
+ *		Note: index scans have 2 sets of state information because
+ *			  we have to keep track of the base relation and the
+ *			  index relation.
+ * ----------------------------------------------------------------
+ */
+BrinSortState *
+ExecInitBrinSort(BrinSort *node, EState *estate, int eflags)
+{
+	BrinSortState *indexstate;
+	Relation	currentRelation;
+	LOCKMODE	lockmode;
+
+	/*
+	 * create state structure
+	 */
+	indexstate = makeNode(BrinSortState);
+	indexstate->ss.ps.plan = (Plan *) node;
+	indexstate->ss.ps.state = estate;
+	indexstate->ss.ps.ExecProcNode = ExecBrinSort;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &indexstate->ss.ps);
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	indexstate->ss.ss_currentRelation = currentRelation;
+	indexstate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &indexstate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+	// ExecInitResultTypeTL(&indexstate->ss.ps);
+	// ExecAssignScanProjectionInfo(&indexstate->ss);
+	// ExecInitResultSlot(&indexstate->ss.ps, &TTSOpsVirtual);
+
+	indexstate->bs_ProjInfo = ExecBuildProjectionInfo(((Plan *) node)->targetlist,
+													  indexstate->ss.ps.ps_ExprContext,
+													  indexstate->ss.ps.ps_ResultTupleSlot,
+													  &indexstate->ss.ps,
+													  indexstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor);
+
+	/*
+	 * initialize child expressions
+	 *
+	 * Note: we don't initialize all of the indexqual expression, only the
+	 * sub-parts corresponding to runtime keys (see below).  Likewise for
+	 * indexorderby, if any.  But the indexqualorig expression is always
+	 * initialized even though it will only be used in some uncommon cases ---
+	 * would be nice to improve that.  (Problem is that any SubPlans present
+	 * in the expression must be found now...)
+	 */
+	indexstate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) indexstate);
+	indexstate->indexqualorig =
+		ExecInitQual(node->indexqualorig, (PlanState *) indexstate);
+
+	/*
+	 * If we are just doing EXPLAIN (ie, aren't going to run the plan), stop
+	 * here.  This allows an index-advisor plugin to EXPLAIN a plan containing
+	 * references to nonexistent indexes.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return indexstate;
+
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+
+	/*
+	 * Initialize index-specific scan state
+	 */
+	indexstate->iss_RuntimeKeysReady = false;
+	indexstate->iss_RuntimeKeys = NULL;
+	indexstate->iss_NumRuntimeKeys = 0;
+
+	/*
+	 * build the index scan keys from the index qualification
+	 */
+	ExecIndexBuildScanKeys((PlanState *) indexstate,
+						   indexstate->iss_RelationDesc,
+						   node->indexqual,
+						   false,
+						   &indexstate->iss_ScanKeys,
+						   &indexstate->iss_NumScanKeys,
+						   &indexstate->iss_RuntimeKeys,
+						   &indexstate->iss_NumRuntimeKeys,
+						   NULL,	/* no ArrayKeys */
+						   NULL);
+
+	/*
+	 * If we have runtime keys, we need an ExprContext to evaluate them. The
+	 * node's standard context won't do because we want to reset that context
+	 * for every tuple.  So, build another context just like the other one...
+	 * -tgl 7/11/00
+	 */
+	if (indexstate->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *stdecontext = indexstate->ss.ps.ps_ExprContext;
+
+		ExecAssignExprContext(estate, &indexstate->ss.ps);
+		indexstate->iss_RuntimeContext = indexstate->ss.ps.ps_ExprContext;
+		indexstate->ss.ps.ps_ExprContext = stdecontext;
+	}
+	else
+	{
+		indexstate->iss_RuntimeContext = NULL;
+	}
+
+	indexstate->bs_tuplesortstate = NULL;
+	indexstate->bs_qual = indexstate->ss.ps.qual;
+	indexstate->ss.ps.qual = NULL;
+	// ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+
+	/*
+	 * all done.
+	 */
+	return indexstate;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortEstimate(BrinSortState *node,
+					  ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+
+	node->iss_PscanLen = index_parallelscan_estimate(node->iss_RelationDesc,
+													 estate->es_snapshot);
+	shm_toc_estimate_chunk(&pcxt->estimator, node->iss_PscanLen);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeDSM
+ *
+ *		Set up a parallel index scan descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeDSM(BrinSortState *node,
+						   ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_allocate(pcxt->toc, node->iss_PscanLen);
+	index_parallelscan_initialize(node->ss.ss_currentRelation,
+								  node->iss_RelationDesc,
+								  estate->es_snapshot,
+								  piscan);
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, piscan);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortReInitializeDSM(BrinSortState *node,
+							 ParallelContext *pcxt)
+{
+	index_parallelrescan(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeWorker(BrinSortState *node,
+							  ParallelWorkerContext *pwcxt)
+{
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7918bb6f0db..86f91e6577c 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -791,6 +791,260 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	path->path.total_cost = startup_cost + run_cost;
 }
 
+void
+cost_brinsort(BrinSortPath *path, PlannerInfo *root, double loop_count,
+		   bool partial_path)
+{
+	IndexOptInfo *index = path->ipath.indexinfo;
+	RelOptInfo *baserel = index->rel;
+	amcostestimate_function amcostestimate;
+	List	   *qpquals;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_run_cost = 0;
+	Cost		indexStartupCost;
+	Cost		indexTotalCost;
+	Selectivity indexSelectivity;
+	double		indexCorrelation,
+				csquared;
+	double		spc_seq_page_cost,
+				spc_random_page_cost;
+	Cost		min_IO_cost,
+				max_IO_cost;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	double		tuples_fetched;
+	double		pages_fetched;
+	double		rand_heap_pages;
+	double		index_pages;
+
+	/* Should only be applied to base relations */
+	Assert(IsA(baserel, RelOptInfo) &&
+		   IsA(index, IndexOptInfo));
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/*
+	 * Mark the path with the correct row estimate, and identify which quals
+	 * will need to be enforced as qpquals.  We need not check any quals that
+	 * are implied by the index's predicate, so we can use indrestrictinfo not
+	 * baserestrictinfo as the list of relevant restriction clauses for the
+	 * rel.
+	 */
+	if (path->ipath.path.param_info)
+	{
+		path->ipath.path.rows = path->ipath.path.param_info->ppi_rows;
+		/* qpquals come from the rel's restriction clauses and ppi_clauses */
+		qpquals = list_concat(extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+														  path->ipath.indexclauses),
+							  extract_nonindex_conditions(path->ipath.path.param_info->ppi_clauses,
+														  path->ipath.indexclauses));
+	}
+	else
+	{
+		path->ipath.path.rows = baserel->rows;
+		/* qpquals come from just the rel's restriction clauses */
+		qpquals = extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+											  path->ipath.indexclauses);
+	}
+
+	if (!enable_indexscan)
+		startup_cost += disable_cost;
+	/* we don't need to check enable_indexonlyscan; indxpath.c does that */
+
+	/*
+	 * Call index-access-method-specific code to estimate the processing cost
+	 * for scanning the index, as well as the selectivity of the index (ie,
+	 * the fraction of main-table tuples we will have to retrieve) and its
+	 * correlation to the main-table tuple order.  We need a cast here because
+	 * pathnodes.h uses a weak function type to avoid including amapi.h.
+	 */
+	amcostestimate = (amcostestimate_function) index->amcostestimate;
+	amcostestimate(root, &path->ipath, loop_count,
+				   &indexStartupCost, &indexTotalCost,
+				   &indexSelectivity, &indexCorrelation,
+				   &index_pages);
+
+	/*
+	 * Save amcostestimate's results for possible use in bitmap scan planning.
+	 * We don't bother to save indexStartupCost or indexCorrelation, because a
+	 * bitmap scan doesn't care about either.
+	 */
+	path->ipath.indextotalcost = indexTotalCost;
+	path->ipath.indexselectivity = indexSelectivity;
+
+	/* all costs for touching index itself included here */
+	startup_cost += indexStartupCost;
+	run_cost += indexTotalCost - indexStartupCost;
+
+	/* estimate number of main-table tuples fetched */
+	tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);
+
+	/* fetch estimated page costs for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/*----------
+	 * Estimate number of main-table pages fetched, and compute I/O cost.
+	 *
+	 * When the index ordering is uncorrelated with the table ordering,
+	 * we use an approximation proposed by Mackert and Lohman (see
+	 * index_pages_fetched() for details) to compute the number of pages
+	 * fetched, and then charge spc_random_page_cost per page fetched.
+	 *
+	 * When the index ordering is exactly correlated with the table ordering
+	 * (just after a CLUSTER, for example), the number of pages fetched should
+	 * be exactly selectivity * table_size.  What's more, all but the first
+	 * will be sequential fetches, not the random fetches that occur in the
+	 * uncorrelated case.  So if the number of pages is more than 1, we
+	 * ought to charge
+	 *		spc_random_page_cost + (pages_fetched - 1) * spc_seq_page_cost
+	 * For partially-correlated indexes, we ought to charge somewhere between
+	 * these two estimates.  We currently interpolate linearly between the
+	 * estimates based on the correlation squared (XXX is that appropriate?).
+	 *
+	 * If it's an index-only scan, then we will not need to fetch any heap
+	 * pages for which the visibility map shows all tuples are visible.
+	 * Hence, reduce the estimated number of heap fetches accordingly.
+	 * We use the measured fraction of the entire heap that is all-visible,
+	 * which might not be particularly relevant to the subset of the heap
+	 * that this query will fetch; but it's not clear how to do better.
+	 *----------
+	 */
+	if (loop_count > 1)
+	{
+		/*
+		 * For repeated indexscans, the appropriate estimate for the
+		 * uncorrelated case is to scale up the number of tuples fetched in
+		 * the Mackert and Lohman formula by the number of scans, so that we
+		 * estimate the number of pages fetched by all the scans; then
+		 * pro-rate the costs for one scan.  In this case we assume all the
+		 * fetches are random accesses.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+
+		/*
+		 * In the perfectly correlated case, the number of pages touched by
+		 * each scan is selectivity * table_size, and we can use the Mackert
+		 * and Lohman formula at the page level to estimate how much work is
+		 * saved by caching across scans.  We still assume all the fetches are
+		 * random, though, which is an overestimate that's hard to correct for
+		 * without double-counting the cache effects.  (But in most cases
+		 * where such a plan is actually interesting, only one page would get
+		 * fetched per scan anyway, so it shouldn't matter much.)
+		 */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		pages_fetched = index_pages_fetched(pages_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		min_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+	}
+	else
+	{
+		/*
+		 * Normal case: apply the Mackert and Lohman formula, and then
+		 * interpolate between that and the correlation-derived result.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		/* max_IO_cost is for the perfectly uncorrelated case (csquared=0) */
+		max_IO_cost = pages_fetched * spc_random_page_cost;
+
+		/* min_IO_cost is for the perfectly correlated case (csquared=1) */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		if (pages_fetched > 0)
+		{
+			min_IO_cost = spc_random_page_cost;
+			if (pages_fetched > 1)
+				min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
+		}
+		else
+			min_IO_cost = 0;
+	}
+
+	if (partial_path)
+	{
+		/*
+		 * Estimate the number of parallel workers required to scan index. Use
+		 * the number of heap pages computed considering heap fetches won't be
+		 * sequential as for parallel scans the pages are accessed in random
+		 * order.
+		 */
+		path->ipath.path.parallel_workers = compute_parallel_worker(baserel,
+															  rand_heap_pages,
+															  index_pages,
+															  max_parallel_workers_per_gather);
+
+		/*
+		 * Fall out if workers can't be assigned for parallel scan, because in
+		 * such a case this path will be rejected.  So there is no benefit in
+		 * doing extra computation.
+		 */
+		if (path->ipath.path.parallel_workers <= 0)
+			return;
+
+		path->ipath.path.parallel_aware = true;
+	}
+
+	/*
+	 * Now interpolate based on estimated index order correlation to get total
+	 * disk I/O cost for main table accesses.
+	 */
+	csquared = indexCorrelation * indexCorrelation;
+
+	run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
+
+	/*
+	 * Estimate CPU costs per tuple.
+	 *
+	 * What we want here is cpu_tuple_cost plus the evaluation costs of any
+	 * qual clauses that we have to evaluate as qpquals.
+	 */
+	cost_qual_eval(&qpqual_cost, qpquals, root);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+
+	cpu_run_cost += cpu_per_tuple * tuples_fetched;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->ipath.path.pathtarget->cost.startup;
+	cpu_run_cost += path->ipath.path.pathtarget->cost.per_tuple * path->ipath.path.rows;
+
+	/* Adjust costing for parallelism, if used. */
+	if (path->ipath.path.parallel_workers > 0)
+	{
+		double		parallel_divisor = get_parallel_divisor(&path->ipath.path);
+
+		path->ipath.path.rows = clamp_row_est(path->ipath.path.rows / parallel_divisor);
+
+		/* The CPU cost is divided among all the workers. */
+		cpu_run_cost /= parallel_divisor;
+	}
+
+	run_cost += cpu_run_cost;
+
+	path->ipath.path.startup_cost = startup_cost;
+	path->ipath.path.total_cost = startup_cost + run_cost;
+}
+
 /*
  * extract_nonindex_conditions
  *
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 011a0337dad..c2bbaf06bc1 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -17,12 +17,16 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
+#include "access/relation.h"
 #include "access/stratnum.h"
 #include "access/sysattr.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
@@ -32,10 +36,13 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/rel.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
 
+bool		enable_brinsort = true;
+
 /* XXX see PartCollMatchesExprColl */
 #define IndexCollMatchesExprColl(idxcollation, exprcollation) \
 	((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
@@ -1103,6 +1110,182 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	/*
+	 * If this is a BRIN index with suitable opclass (minmax or such), we may
+	 * try doing BRIN sort. BRIN indexes are not ordered and amcanorderbyop
+	 * is set to false, so we probably will need some new opclass flag to
+	 * mark indexes that support this.
+	 */
+	if (enable_brinsort && pathkeys_possibly_useful)
+	{
+		ListCell *lc;
+		Relation rel2 = relation_open(index->indexoid, NoLock);
+		int		 idx;
+
+		/*
+		 * Try generating sorted paths for each key with the right opclass.
+		 */
+		idx = -1;
+		foreach(lc, index->indextlist)
+		{
+			TargetEntry	   *indextle = (TargetEntry *) lfirst(lc);
+			BrinSortPath   *bpath;
+			Oid				rangeproc;
+			AttrNumber		attnum;
+
+			idx++;
+			attnum = (idx + 1);
+
+
+			/* XXX ignore non-BRIN indexes */
+			if (rel2->rd_rel->relam != BRIN_AM_OID)
+				continue;
+
+			/*
+			 * XXX Ignore keys not using an opclass with the "ranges" proc.
+			 * For now we only do this for some minmax opclasses, but adding
+			 * it to all minmax is simple, and adding it to minmax-multi
+			 * should not be very hard.
+			 */
+			rangeproc = index_getprocid(rel2, attnum, BRIN_PROCNUM_RANGES);
+			if (!OidIsValid(rangeproc))
+				continue;
+
+			/*
+			 * XXX stuff extracted from build_index_pathkeys, except that we
+			 * only deal with a single index key (producing a single pathkey),
+			 * so we only sort on a single column. I guess we could use more
+			 * index keys and sort on more expressions? Would that mean these
+			 * keys need to be rather well correlated? In any case, it seems
+			 * rather complex to implement, so I leave it as a possible
+			 * future improvement.
+			 *
+			 * XXX This could also use the other BRIN keys (even from other
+			 * indexes) in a different way - we might use the other ranges
+			 * to quickly eliminate some of the chunks, essentially like a
+			 * bitmap, but maybe without using the bitmap. Or we might use
+			 * other indexes through bitmaps.
+			 *
+			 * XXX This fakes a number of parameters, because we don't store
+			 * the btree opclass in the index, instead we use the default
+			 * one for the key data type. And BRIN does not allow specifying
+			 *
+			 * XXX We don't add the path to result, because this function is
+			 * supposed to generate IndexPaths. Instead, we just add the path
+			 * using add_path(). We should be building this in a different
+			 * place, perhaps in create_index_paths() or so.
+			 *
+			 * XXX By building it elsewhere, we could also leverage the index
+			 * paths we've built here, particularly the bitmap index paths,
+			 * which we could use to eliminate many of the ranges.
+			 *
+			 * XXX We don't have any explicit ordering associated with the
+			 * BRIN index, e.g. we don't have ASC/DESC and NULLS FIRST/LAST.
+			 * So this is not encoded in the index, and we can satisfy all
+			 * these cases - but we need to add paths for each combination.
+			 * I wonder if there's a better way to do this.
+			 */
+
+			/* ASC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* ASC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+		}
+
+		relation_close(rel2, NoLock);
+	}
+
 	return result;
 }
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index c4e7f97f687..10ea23a501b 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -27,6 +27,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/lsyscache.h"
+#include "utils/typcache.h"
 
 
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
@@ -622,6 +623,54 @@ build_index_pathkeys(PlannerInfo *root,
 	return retval;
 }
 
+
+List *
+build_index_pathkeys_brin(PlannerInfo *root,
+						  IndexOptInfo *index,
+						  TargetEntry  *tle,
+						  int idx,
+						  bool reverse_sort,
+						  bool nulls_first)
+{
+	TypeCacheEntry *typcache;
+	PathKey		   *cpathkey;
+	Oid				sortopfamily;
+
+	/*
+	 * Get default btree opfamily for the type, extracted from the
+	 * entry in index targetlist.
+	 *
+	 * XXX Is there a better / more correct way to do this?
+	 */
+	typcache = lookup_type_cache(exprType((Node *) tle->expr),
+								 TYPECACHE_BTREE_OPFAMILY);
+	sortopfamily = typcache->btree_opf;
+
+	/*
+	 * OK, try to make a canonical pathkey for this sort key.  Note we're
+	 * underneath any outer joins, so nullable_relids should be NULL.
+	 */
+	cpathkey = make_pathkey_from_sortinfo(root,
+										  tle->expr,
+										  sortopfamily,
+										  index->opcintype[idx],
+										  index->indexcollations[idx],
+										  reverse_sort,
+										  nulls_first,
+										  0,
+										  index->rel->relids,
+										  false);
+
+	/*
+	 * There may be no pathkey if we haven't matched any sortkey, in which
+	 * case ignore it.
+	 */
+	if (!cpathkey)
+		return NIL;
+
+	return list_make1(cpathkey);
+}
+
 /*
  * partkey_is_bool_constant_for_query
  *
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index fa09a6103b1..cd2935fa011 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -18,6 +18,7 @@
 
 #include <math.h>
 
+#include "access/genam.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
 #include "foreign/fdwapi.h"
@@ -41,6 +42,7 @@
 #include "parser/parsetree.h"
 #include "partitioning/partprune.h"
 #include "utils/lsyscache.h"
+#include "utils/rel.h"
 
 
 /*
@@ -124,6 +126,8 @@ static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
 										  List *tlist, List *scan_clauses);
 static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
 								   List *tlist, List *scan_clauses, bool indexonly);
+static BrinSort *create_brinsort_plan(PlannerInfo *root, BrinSortPath *best_path,
+									  List *tlist, List *scan_clauses);
 static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
 											   BitmapHeapPath *best_path,
 											   List *tlist, List *scan_clauses);
@@ -191,6 +195,9 @@ static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 List *indexorderby,
 										 List *indextlist,
 										 ScanDirection indexscandir);
+static BrinSort *make_brinsort(List *qptlist, List *qpqual, Index scanrelid,
+							   Oid indexid, List *indexqual, List *indexqualorig,
+							   ScanDirection indexscandir);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -410,6 +417,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_CustomScan:
 			plan = create_scan_plan(root, best_path, flags);
 			break;
+		case T_BrinSort:
+			plan = create_scan_plan(root, best_path, flags);
+			break;
 		case T_HashJoin:
 		case T_MergeJoin:
 		case T_NestLoop:
@@ -776,6 +786,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												   scan_clauses);
 			break;
 
+		case T_BrinSort:
+			plan = (Plan *) create_brinsort_plan(root,
+												 (BrinSortPath *) best_path,
+												 tlist,
+												 scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -3185,6 +3202,223 @@ create_indexscan_plan(PlannerInfo *root,
 	return scan_plan;
 }
 
+/*
+ * create_brinsort_plan
+ *	  Returns a brinsort plan for the base relation scanned by 'best_path'
+ *	  with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ *
+ * This is mostly a slighly simplified version of create_indexscan_plan, with
+ * the unecessary parts removed (we don't support indexonly scans, or reordering
+ * and similar stuff).
+ */
+static BrinSort *
+create_brinsort_plan(PlannerInfo *root,
+					 BrinSortPath *best_path,
+					 List *tlist,
+					 List *scan_clauses)
+{
+	BrinSort   *brinsort_plan;
+	List	   *indexclauses = best_path->ipath.indexclauses;
+	Index		baserelid = best_path->ipath.path.parent->relid;
+	IndexOptInfo *indexinfo = best_path->ipath.indexinfo;
+	Oid			indexoid = indexinfo->indexoid;
+	Relation	indexRel;
+	List	   *indexprs;
+	List	   *qpqual;
+	List	   *stripped_indexquals;
+	List	   *fixed_indexquals;
+	ListCell   *l;
+
+	List	   *pathkeys = best_path->ipath.path.pathkeys;
+
+	/* it should be a base rel... */
+	Assert(baserelid > 0);
+	Assert(best_path->ipath.path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * Extract the index qual expressions (stripped of RestrictInfos) from the
+	 * IndexClauses list, and prepare a copy with index Vars substituted for
+	 * table Vars.  (This step also does replace_nestloop_params on the
+	 * fixed_indexquals.)
+	 */
+	fix_indexqual_references(root, &best_path->ipath,
+							 &stripped_indexquals,
+							 &fixed_indexquals);
+
+	/*
+	 * The qpqual list must contain all restrictions not automatically handled
+	 * by the index, other than pseudoconstant clauses which will be handled
+	 * by a separate gating plan node.  All the predicates in the indexquals
+	 * will be checked (either by the index itself, or by nodeIndexscan.c),
+	 * but if there are any "special" operators involved then they must be
+	 * included in qpqual.  The upshot is that qpqual must contain
+	 * scan_clauses minus whatever appears in indexquals.
+	 *
+	 * is_redundant_with_indexclauses() detects cases where a scan clause is
+	 * present in the indexclauses list or is generated from the same
+	 * EquivalenceClass as some indexclause, and is therefore redundant with
+	 * it, though not equal.  (The latter happens when indxpath.c prefers a
+	 * different derived equality than what generate_join_implied_equalities
+	 * picked for a parameterized scan's ppi_clauses.)  Note that it will not
+	 * match to lossy index clauses, which is critical because we have to
+	 * include the original clause in qpqual in that case.
+	 *
+	 * In some situations (particularly with OR'd index conditions) we may
+	 * have scan_clauses that are not equal to, but are logically implied by,
+	 * the index quals; so we also try a predicate_implied_by() check to see
+	 * if we can discard quals that way.  (predicate_implied_by assumes its
+	 * first input contains only immutable functions, so we have to check
+	 * that.)
+	 *
+	 * Note: if you change this bit of code you should also look at
+	 * extract_nonindex_conditions() in costsize.c.
+	 */
+	qpqual = NIL;
+	foreach(l, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (rinfo->pseudoconstant)
+			continue;			/* we may drop pseudoconstants here */
+		if (is_redundant_with_indexclauses(rinfo, indexclauses))
+			continue;			/* dup or derived from same EquivalenceClass */
+		if (!contain_mutable_functions((Node *) rinfo->clause) &&
+			predicate_implied_by(list_make1(rinfo->clause), stripped_indexquals,
+								 false))
+			continue;			/* provably implied by indexquals */
+		qpqual = lappend(qpqual, rinfo);
+	}
+
+	/* Sort clauses into best execution order */
+	qpqual = order_qual_clauses(root, qpqual);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	qpqual = extract_actual_clauses(qpqual, false);
+
+	/*
+	 * We have to replace any outer-relation variables with nestloop params in
+	 * the indexqualorig, qpqual, and indexorderbyorig expressions.  A bit
+	 * annoying to have to do this separately from the processing in
+	 * fix_indexqual_references --- rethink this when generalizing the inner
+	 * indexscan support.  But note we can't really do this earlier because
+	 * it'd break the comparisons to predicates above ... (or would it?  Those
+	 * wouldn't have outer refs)
+	 */
+	if (best_path->ipath.path.param_info)
+	{
+		stripped_indexquals = (List *)
+			replace_nestloop_params(root, (Node *) stripped_indexquals);
+		qpqual = (List *)
+			replace_nestloop_params(root, (Node *) qpqual);
+	}
+
+	/* Finally ready to build the plan node */
+	brinsort_plan = make_brinsort(tlist,
+								  qpqual,
+								  baserelid,
+								  indexoid,
+								  fixed_indexquals,
+								  stripped_indexquals,
+								  best_path->ipath.indexscandir);
+
+	Assert(list_length(pathkeys) == 1);
+
+	if (pathkeys != NIL)
+	{
+		/*
+		 * Compute sort column info, and adjust the Append's tlist as needed.
+		 * Because we pass adjust_tlist_in_place = true, we may ignore the
+		 * function result; it must be the same plan node.  However, we then
+		 * need to detect whether any tlist entries were added.
+		 */
+		(void) prepare_sort_from_pathkeys((Plan *) brinsort_plan, pathkeys,
+										  best_path->ipath.path.parent->relids,
+										  NULL,
+										  true,
+										  &brinsort_plan->numCols,
+										  &brinsort_plan->sortColIdx,
+										  &brinsort_plan->sortOperators,
+										  &brinsort_plan->collations,
+										  &brinsort_plan->nullsFirst);
+		//tlist_was_changed = (orig_tlist_length != list_length(plan->plan.targetlist));
+		for (int i = 0; i < brinsort_plan->numCols; i++)
+			elog(DEBUG1, "%d => %d %d %d %d", i,
+				 brinsort_plan->sortColIdx[i],
+				 brinsort_plan->sortOperators[i],
+				 brinsort_plan->collations[i],
+				 brinsort_plan->nullsFirst[i]);
+	}
+
+	copy_generic_path_info(&brinsort_plan->scan.plan, &best_path->ipath.path);
+
+	/*
+	 * Now lookup the index attnums for sort expressions.
+	 *
+	 * Determine index attnum we're interested in. sortColIdx is an index into
+	 * the target list, so we need to grab the expression and try to match it
+	 * to the index. The expression may be either plain Var (in which case we
+	 * match it to indkeys value), or an expression (in which case we match it
+	 * to indexprs).
+	 *
+	 * XXX We've already matched the sort key to the index, otherwise we would
+	 * not get here. So maybe we could just remember it, somehow? Also, we must
+	 * keep the decisions made in these two places consistent - if we fail to
+	 * match a sort key here (which we matched before), we have a problem.
+	 *
+	 * FIXME lock mode for index_open
+	 */
+	indexRel = index_open(indexoid, NoLock);
+	indexprs = RelationGetIndexExpressions(indexRel);
+
+	brinsort_plan->attnums
+		= (AttrNumber *) palloc0(sizeof(AttrNumber) * brinsort_plan->numCols);
+
+	for (int i = 0; i < brinsort_plan->numCols; i++)
+	{
+		TargetEntry *tle;
+		int			expridx = 0;	/* expression index */
+
+		tle = list_nth(brinsort_plan->scan.plan.targetlist,
+					   brinsort_plan->sortColIdx[i] - 1);	/* FIXME proper colidx */
+
+		/* find the index key matching the expression from the target entry */
+		for (int j = 0; j < indexRel->rd_index->indnatts; j++)
+		{
+			AttrNumber indkey = indexRel->rd_index->indkey.values[j];
+
+			if (AttributeNumberIsValid(indkey))
+			{
+				Var *var = (Var *) tle->expr;
+
+				if (!IsA(tle->expr, Var))
+					continue;
+
+				if (var->varattno == indkey)
+				{
+					brinsort_plan->attnums[i] = (j + 1);
+					break;
+				}
+			}
+			else
+			{
+				Node *expr = (Node *) list_nth(indexprs, expridx);
+
+				if (equal(expr, tle->expr))
+				{
+					brinsort_plan->attnums[i] = (j + 1);
+					break;
+				}
+
+				expridx++;
+			}
+		}
+	}
+
+	index_close(indexRel, NoLock);
+
+	return brinsort_plan;
+}
+
 /*
  * create_bitmap_scan_plan
  *	  Returns a bitmap scan plan for the base relation scanned by 'best_path'
@@ -5539,6 +5773,31 @@ make_indexscan(List *qptlist,
 	return node;
 }
 
+static BrinSort *
+make_brinsort(List *qptlist,
+			   List *qpqual,
+			   Index scanrelid,
+			   Oid indexid,
+			   List *indexqual,
+			   List *indexqualorig,
+			   ScanDirection indexscandir)
+{
+	BrinSort  *node = makeNode(BrinSort);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->indexid = indexid;
+	node->indexqual = indexqual;
+	node->indexqualorig = indexqualorig;
+	node->indexorderdir = indexscandir;
+
+	return node;
+}
+
 static IndexOnlyScan *
 make_indexonlyscan(List *qptlist,
 				   List *qpqual,
@@ -7173,6 +7432,7 @@ is_projection_capable_path(Path *path)
 		case T_Memoize:
 		case T_Sort:
 		case T_IncrementalSort:
+		case T_BrinSort:
 		case T_Unique:
 		case T_SetOp:
 		case T_LockRows:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 5cc8366af66..ce7cbce02a3 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -708,6 +708,25 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 				return set_indexonlyscan_references(root, splan, rtoffset);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *splan = (BrinSort *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->indexqual =
+					fix_scan_list(root, splan->indexqual,
+								  rtoffset, 1);
+				splan->indexqualorig =
+					fix_scan_list(root, splan->indexqualorig,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *splan = (BitmapIndexScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index d749b505785..478d56234a4 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1028,6 +1028,63 @@ create_index_path(PlannerInfo *root,
 	return pathnode;
 }
 
+
+/*
+ * create_brinsort_path
+ *	  Creates a path node for sorted brin sort scan.
+ *
+ * 'index' is a usable index.
+ * 'indexclauses' is a list of IndexClause nodes representing clauses
+ *			to be enforced as qual conditions in the scan.
+ * 'indexorderbys' is a list of bare expressions (no RestrictInfos)
+ *			to be used as index ordering operators in the scan.
+ * 'indexorderbycols' is an integer list of index column numbers (zero based)
+ *			the ordering operators can be used with.
+ * 'pathkeys' describes the ordering of the path.
+ * 'indexscandir' is ForwardScanDirection or BackwardScanDirection
+ *			for an ordered index, or NoMovementScanDirection for
+ *			an unordered index.
+ * 'indexonly' is true if an index-only scan is wanted.
+ * 'required_outer' is the set of outer relids for a parameterized path.
+ * 'loop_count' is the number of repetitions of the indexscan to factor into
+ *		estimates of caching behavior.
+ * 'partial_path' is true if constructing a parallel index scan path.
+ *
+ * Returns the new path node.
+ */
+BrinSortPath *
+create_brinsort_path(PlannerInfo *root,
+					 IndexOptInfo *index,
+					 List *indexclauses,
+					 List *pathkeys,
+					 ScanDirection indexscandir,
+					 bool indexonly,
+					 Relids required_outer,
+					 double loop_count,
+					 bool partial_path)
+{
+	BrinSortPath  *pathnode = makeNode(BrinSortPath);
+	RelOptInfo *rel = index->rel;
+
+	pathnode->ipath.path.pathtype = T_BrinSort;
+	pathnode->ipath.path.parent = rel;
+	pathnode->ipath.path.pathtarget = rel->reltarget;
+	pathnode->ipath.path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->ipath.path.parallel_aware = false;
+	pathnode->ipath.path.parallel_safe = rel->consider_parallel;
+	pathnode->ipath.path.parallel_workers = 0;
+	pathnode->ipath.path.pathkeys = pathkeys;
+
+	pathnode->ipath.indexinfo = index;
+	pathnode->ipath.indexclauses = indexclauses;
+	pathnode->ipath.indexscandir = indexscandir;
+
+	cost_brinsort(pathnode, root, loop_count, partial_path);
+
+	return pathnode;
+}
+
 /*
  * create_bitmap_heap_path
  *	  Creates a path node for a bitmap scan.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 9748a3bfcc5..0ee5f8784a1 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -102,6 +102,10 @@ extern bool debug_brin_stats;
 extern bool debug_brin_cross_check;
 #endif
 
+#ifdef DEBUG_BRIN_SORT
+extern bool debug_brin_sort;
+#endif
+
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
 #endif
@@ -1018,6 +1022,16 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_brinsort", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of BRIN sort plans."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_brinsort,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
@@ -1259,6 +1273,20 @@ struct config_bool ConfigureNamesBool[] =
 	},
 #endif
 
+#ifdef DEBUG_BRIN_SORT
+	/* this is undocumented because not exposed in a standard build */
+	{
+		{"debug_brin_sort", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Print info about BRIN sorting."),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&debug_brin_sort,
+		false,
+		NULL, NULL, NULL
+	},
+#endif
+
 	{
 		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Terminate session on any error."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 47e80ad150c..10b97e96bc8 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -371,6 +371,7 @@
 
 #enable_async_append = on
 #enable_bitmapscan = on
+#enable_brinsort = off
 #enable_gathermerge = on
 #enable_hashagg = on
 #enable_hashjoin = on
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 9ca9835aab6..750ee016fd7 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2574,6 +2574,18 @@ tuplesort_get_stats(Tuplesortstate *state,
 	}
 }
 
+/*
+ * tuplesort_reset_stats - reset summary statistics
+ *
+ * This can be called before tuplesort_performsort() starts.
+ */
+void
+tuplesort_reset_stats(Tuplesortstate *state)
+{
+	state->isMaxSpaceDisk = false;
+	state->maxSpace = 0;
+}
+
 /*
  * Convert TuplesortMethod to a string.
  */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index 1d21b816fcd..cdfa7421ae9 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,41 +34,6 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
-/*
- * Info about ranges for BRIN Sort.
- */
-typedef struct BrinRange
-{
-	BlockNumber blkno_start;
-	BlockNumber blkno_end;
-
-	Datum	min_value;
-	Datum	max_value;
-	bool	has_nulls;
-	bool	all_nulls;
-	bool	not_summarized;
-
-	/*
-	 * Index of the range when ordered by min_value (if there are multiple
-	 * ranges with the same min_value, it's the lowest one).
-	 */
-	uint32	min_index;
-
-	/*
-	 * Minimum min_index from all ranges with higher max_value (i.e. when
-	 * sorted by max_value). If there are multiple ranges with the same
-	 * max_value, it depends on the ordering (i.e. the ranges may get
-	 * different min_index_lowest, depending on the exact ordering).
-	 */
-	uint32	min_index_lowest;
-} BrinRange;
-
-typedef struct BrinRanges
-{
-	int			nranges;
-	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
-} BrinRanges;
-
 typedef struct BrinMinmaxStats
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index eac796e6f47..6dd3f3e3c11 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -76,6 +76,7 @@ typedef struct BrinDesc
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
 #define BRIN_PROCNUM_STATISTICS		11	/* optional */
+#define BRIN_PROCNUM_RANGES 		12	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 9bbd1f14f12..b499699b350 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -806,6 +806,8 @@
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -839,6 +841,8 @@
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -870,6 +874,8 @@
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -901,6 +907,8 @@
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -915,6 +923,8 @@
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -929,6 +939,8 @@
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1048,6 +1060,8 @@
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1078,6 +1092,8 @@
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1128,6 +1144,8 @@
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1181,6 +1199,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1197,6 +1218,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1288,6 +1312,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1344,6 +1371,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1398,6 +1428,8 @@
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1471,6 +1503,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1504,6 +1539,8 @@
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1557,6 +1594,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1573,6 +1613,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1587,6 +1630,8 @@
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1716,6 +1761,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1772,6 +1820,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1824,6 +1875,8 @@
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1841,6 +1894,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1858,6 +1914,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1912,6 +1971,8 @@
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1988,6 +2049,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index af31ff911eb..1bc5780b7cc 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5309,7 +5309,7 @@
   proname => 'pg_stat_get_numscans', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_numscans' },
-{ oid => '9976', descr => 'statistics: time of the last scan for table/index',
+{ oid => '8912', descr => 'statistics: time of the last scan for table/index',
   proname => 'pg_stat_get_lastscan', provolatile => 's', proparallel => 'r',
   prorettype => 'timestamptz', proargtypes => 'oid',
   prosrc => 'pg_stat_get_lastscan' },
@@ -8500,6 +8500,9 @@
   proname => 'brin_minmax_stats', prorettype => 'bool',
   proargtypes => 'internal internal int2 int2 internal int4',
   prosrc => 'brin_minmax_stats' },
+{ oid => '9801', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_ranges', prorettype => 'bool',
+  proargtypes => 'internal int2 bool', prosrc => 'brin_minmax_ranges' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/executor/nodeBrinSort.h b/src/include/executor/nodeBrinSort.h
new file mode 100644
index 00000000000..3cac599d811
--- /dev/null
+++ b/src/include/executor/nodeBrinSort.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeBrinSort.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODEBRIN_SORT_H
+#define NODEBRIN_SORT_H
+
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "nodes/execnodes.h"
+
+extern BrinSortState *ExecInitBrinSort(BrinSort *node, EState *estate, int eflags);
+extern void ExecEndBrinSort(BrinSortState *node);
+extern void ExecBrinSortMarkPos(BrinSortState *node);
+extern void ExecBrinSortRestrPos(BrinSortState *node);
+extern void ExecReScanBrinSort(BrinSortState *node);
+extern void ExecBrinSortEstimate(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortReInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeWorker(BrinSortState *node,
+										  ParallelWorkerContext *pwcxt);
+
+/*
+ * These routines are exported to share code with nodeIndexonlyscan.c and
+ * nodeBitmapBrinSort.c
+ */
+extern void ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
+								   List *quals, bool isorderby,
+								   ScanKey *scanKeys, int *numScanKeys,
+								   IndexRuntimeKeyInfo **runtimeKeys, int *numRuntimeKeys,
+								   IndexArrayKeyInfo **arrayKeys, int *numArrayKeys);
+extern void ExecIndexEvalRuntimeKeys(ExprContext *econtext,
+									 IndexRuntimeKeyInfo *runtimeKeys, int numRuntimeKeys);
+extern bool ExecIndexEvalArrayKeys(ExprContext *econtext,
+								   IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+extern bool ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+
+#endif							/* NODEBRIN_SORT_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 20f4c8b35f3..efe26938d0a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1565,6 +1565,114 @@ typedef struct IndexScanState
 	Size		iss_PscanLen;
 } IndexScanState;
 
+typedef enum {
+	BRINSORT_START,
+	BRINSORT_LOAD_RANGE,
+	BRINSORT_PROCESS_RANGE,
+	BRINSORT_LOAD_NULLS,
+	BRINSORT_PROCESS_NULLS,
+	BRINSORT_FINISHED
+} BrinSortPhase;
+
+typedef struct BrinRangeScanDesc
+{
+	/* range info tuple descriptor */
+	TupleDesc		tdesc;
+
+	/* ranges, sorted by minval, blkno_start */
+	Tuplesortstate *ranges;
+
+	/* number of ranges in the tuplesort */
+	int64			nranges;
+
+	/* distinct minval (sorted) */
+	Tuplestorestate *minvals;
+
+	/* slot for accessing the tuplesort/tuplestore */
+	TupleTableSlot  *slot;
+
+} BrinRangeScanDesc;
+
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinSortState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	ExprState  *indexqualorig;
+	List	   *indexorderbyorig;
+	struct ScanKeyData *iss_ScanKeys;
+	int			iss_NumScanKeys;
+	struct ScanKeyData *iss_OrderByKeys;
+	int			iss_NumOrderByKeys;
+	IndexRuntimeKeyInfo *iss_RuntimeKeys;
+	int			iss_NumRuntimeKeys;
+	bool		iss_RuntimeKeysReady;
+	ExprContext *iss_RuntimeContext;
+	Relation	iss_RelationDesc;
+	struct IndexScanDescData *iss_ScanDesc;
+
+	/* These are needed for re-checking ORDER BY expr ordering */
+	pairingheap *iss_ReorderQueue;
+	bool		iss_ReachedEnd;
+	Datum	   *iss_OrderByValues;
+	bool	   *iss_OrderByNulls;
+	SortSupport iss_SortSupport;
+	bool	   *iss_OrderByTypByVals;
+	int16	   *iss_OrderByTypLens;
+	Size		iss_PscanLen;
+
+	/* */
+	BrinRangeScanDesc *bs_scan;
+	BrinRange	   *bs_range;
+	ExprState	   *bs_qual;
+	Datum			bs_watermark;
+	bool			bs_watermark_set;
+	bool			bs_watermark_empty;
+	BrinSortPhase	bs_phase;
+	SortSupportData	bs_sortsupport;
+	ProjectionInfo *bs_ProjInfo;
+
+	/*
+	 * We need two tuplesort instances - one for current range, one for
+	 * spill-over tuples from the overlapping ranges
+	 */
+	void		   *bs_tuplesortstate;
+	Tuplestorestate *bs_tuplestore;
+} BrinSortState;
+
 /* ----------------
  *	 IndexOnlyScanState information
  *
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d61a62da196..3d93b2ac714 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1690,6 +1690,17 @@ typedef struct IndexPath
 	Selectivity indexselectivity;
 } IndexPath;
 
+/*
+ * read sorted data from brin index
+ *
+ * We use IndexPath, because that's what amcostestimate is expecting, but
+ * we typedef it as a separate struct.
+ */
+typedef struct BrinSortPath
+{
+	IndexPath	ipath;
+} BrinSortPath;
+
 /*
  * Each IndexClause references a RestrictInfo node from the query's WHERE
  * or JOIN conditions, and shows how that restriction can be applied to
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 659bd05c0c1..341dfc57826 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -501,6 +501,38 @@ typedef struct IndexOnlyScan
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
 } IndexOnlyScan;
 
+/*
+ * XXX Does it make sense (is it possible) to have a sort by more than one
+ * column, using a BRIN index?
+ */
+typedef struct BrinSort
+{
+	Scan		scan;
+	Oid			indexid;		/* OID of index to scan */
+	List	   *indexqual;		/* list of index quals (usually OpExprs) */
+	List	   *indexqualorig;	/* the same in original form */
+	ScanDirection indexorderdir;	/* forward or backward or don't care */
+
+	/* number of sort-key columns */
+	int			numCols;
+
+	/* attnums in the index */
+	AttrNumber *attnums pg_node_attr(array_size(numCols));
+
+	/* their indexes in the target list */
+	AttrNumber *sortColIdx pg_node_attr(array_size(numCols));
+
+	/* OIDs of operators to sort them by */
+	Oid		   *sortOperators pg_node_attr(array_size(numCols));
+
+	/* OIDs of collations */
+	Oid		   *collations pg_node_attr(array_size(numCols));
+
+	/* NULLS FIRST/LAST directions */
+	bool	   *nullsFirst pg_node_attr(array_size(numCols));
+
+} BrinSort;
+
 /* ----------------
  *		bitmap index scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6cf49705d3a..1fed43645a7 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -70,6 +70,7 @@ extern PGDLLIMPORT bool enable_parallel_hash;
 extern PGDLLIMPORT bool enable_partition_pruning;
 extern PGDLLIMPORT bool enable_presorted_aggregate;
 extern PGDLLIMPORT bool enable_async_append;
+extern PGDLLIMPORT bool enable_brinsort;
 extern PGDLLIMPORT int constraint_exclusion;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
@@ -80,6 +81,8 @@ extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 							ParamPathInfo *param_info);
 extern void cost_index(IndexPath *path, PlannerInfo *root,
 					   double loop_count, bool partial_path);
+extern void cost_brinsort(BrinSortPath *path, PlannerInfo *root,
+						  double loop_count, bool partial_path);
 extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 								  ParamPathInfo *param_info,
 								  Path *bitmapqual, double loop_count);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 69be701b167..03ecc988001 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -49,6 +49,15 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									Relids required_outer,
 									double loop_count,
 									bool partial_path);
+extern BrinSortPath *create_brinsort_path(PlannerInfo *root,
+									IndexOptInfo *index,
+									List *indexclauses,
+									List *pathkeys,
+									ScanDirection indexscandir,
+									bool indexonly,
+									Relids required_outer,
+									double loop_count,
+									bool partial_path);
 extern BitmapHeapPath *create_bitmap_heap_path(PlannerInfo *root,
 											   RelOptInfo *rel,
 											   Path *bitmapqual,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5b9db7733d2..f7707b0a7d7 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -213,6 +213,9 @@ extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,
 extern Path *get_cheapest_parallel_safe_total_inner(List *paths);
 extern List *build_index_pathkeys(PlannerInfo *root, IndexOptInfo *index,
 								  ScanDirection scandir);
+extern List *build_index_pathkeys_brin(PlannerInfo *root, IndexOptInfo *index,
+								  TargetEntry *tle, int idx,
+								  bool reverse_sort, bool nulls_first);
 extern List *build_partition_pathkeys(PlannerInfo *root, RelOptInfo *partrel,
 									  ScanDirection scandir, bool *partialkeys);
 extern List *build_expression_pathkey(PlannerInfo *root, Expr *expr,
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 12578e42bc3..45413dac1a5 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -367,6 +367,7 @@ extern void tuplesort_reset(Tuplesortstate *state);
 
 extern void tuplesort_get_stats(Tuplesortstate *state,
 								TuplesortInstrumentation *stats);
+extern void tuplesort_reset_stats(Tuplesortstate *state);
 extern const char *tuplesort_method_name(TuplesortMethod m);
 extern const char *tuplesort_space_type_name(TuplesortSpaceType t);
 
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index b7fda6fc828..308e912c21c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
 --------------------------------+---------
  enable_async_append            | on
  enable_bitmapscan              | on
+ enable_brinsort                | on
  enable_gathermerge             | on
  enable_hashagg                 | on
  enable_hashjoin                | on
@@ -133,7 +134,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(22 rows)
+(23 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.39.2

0005-wip-brinsort-explain-stats-20230225.patchtext/x-patch; charset=UTF-8; name=0005-wip-brinsort-explain-stats-20230225.patchDownload

From 02e732ad4cc031fead965c3af1b87822e73fe258 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 22:01:01 +0200
Subject: [PATCH 05/11] wip: brinsort explain stats

Show some internal stats about BRIN Sort in EXPLAIN output.
---
 src/backend/commands/explain.c      | 132 ++++++++++++++++++++++++++++
 src/backend/executor/nodeBrinSort.c |  66 +++++++++++---
 src/include/nodes/execnodes.h       |  39 ++++++++
 3 files changed, 223 insertions(+), 14 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 153e41b856f..1305083f1d3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -87,6 +87,8 @@ static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
 static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
 							   ExplainState *es);
+static void show_brinsort_stats(BrinSortState *sortstate, List *ancestors,
+								ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1814,6 +1816,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			show_brinsort_stats(castNode(BrinSortState, planstate), ancestors, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
@@ -2432,6 +2435,135 @@ show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 						 ancestors, es);
 }
 
+static void
+show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSortStats  *stats = &sortstate->bs_stats;
+
+	if (sortstate->bs_scan != NULL &&
+		sortstate->bs_scan->ranges != NULL)
+	{
+		TuplesortInstrumentation stats;
+
+		memset(&stats, 0, sizeof(TuplesortInstrumentation));
+		tuplesort_get_stats(sortstate->bs_scan->ranges, &stats);
+
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Ranges: " INT64_FORMAT "  Build time: " INT64_FORMAT "  Method: %s  Space: " INT64_FORMAT " kB (%s)\n",
+						 sortstate->bs_scan->nranges,
+						 sortstate->bs_stats.ranges_build_ms,
+						 tuplesort_method_name(stats.sortMethod),
+						 stats.spaceUsed,
+						 tuplesort_space_type_name(stats.spaceType));
+	}
+
+	if (stats->sort_count > 0)
+	{
+		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
+							   stats->range_count, es);
+
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainPropertyInteger("Sorts", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Tuples Sorted: " INT64_FORMAT "  Per-sort: " INT64_FORMAT  "  Direct: " INT64_FORMAT "  Spilled: " INT64_FORMAT "  Respilled: " INT64_FORMAT "\n",
+							 stats->ntuples_tuplesort_all,
+							 stats->ntuples_tuplesort_all / stats->sort_count,
+							 stats->ntuples_tuplesort_direct,
+							 stats->ntuples_spilled,
+							 stats->ntuples_respilled);
+		}
+		else
+		{
+			ExplainOpenGroup("Sorts", "Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainPropertyInteger("Tuples per sort", NULL, (int64)
+								   stats->ntuples_tuplesort_all / stats->sort_count, es);
+
+			ExplainPropertyInteger("Sorted tuples (all)", NULL, (int64)
+								   stats->ntuples_tuplesort_all, es);
+
+			ExplainPropertyInteger("Sorted tuples (direct)", NULL, (int64)
+								   stats->ntuples_tuplesort_direct, es);
+
+			ExplainPropertyInteger("Spilled tuples", NULL, (int64)
+								   stats->ntuples_spilled, es);
+
+			ExplainPropertyInteger("Respilled tuples", NULL, (int64)
+								   stats->ntuples_respilled, es);
+
+			ExplainCloseGroup("Sorts", "Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_in_memory > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (in-memory)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_in_memory,
+							 stats->total_space_used_in_memory,
+							 stats->max_space_used_in_memory,
+							 stats->total_space_used_in_memory / stats->sort_count_in_memory);
+		}
+		else
+		{
+			ExplainOpenGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_in_memory / stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_in_memory, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_in_memory, es);
+
+			ExplainCloseGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_on_disk > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (on-disk)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_on_disk,
+							 stats->total_space_used_on_disk,
+							 stats->max_space_used_on_disk,
+							 stats->total_space_used_on_disk / stats->sort_count_on_disk);
+		}
+		else
+		{
+			ExplainOpenGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_on_disk / stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_on_disk, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_on_disk, es);
+
+			ExplainCloseGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+		}
+	}
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 9505eafc548..614d28c83b1 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -450,6 +450,8 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
 		return;
 
+	node->bs_stats.range_count++;
+
 	brinsort_start_tidscan(node);
 
 	scan = node->ss.ss_currentScanDesc;
@@ -534,7 +536,10 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 				/* Stash it to the tuplestore (when NULL, or ignore
 				 * it (when not-NULL). */
 				if (isnull)
+				{
 					tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+					node->bs_stats.ntuples_spilled++;
+				}
 
 				/* NULL or not, we're done */
 				continue;
@@ -554,7 +559,12 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 										  &node->bs_sortsupport);
 
 			if (cmp <= 0)
+			{
 				tuplesort_puttupleslot(node->bs_tuplesortstate, tmpslot);
+				node->bs_stats.ntuples_tuplesort_direct++;
+				node->bs_stats.ntuples_tuplesort_all++;
+				node->bs_stats.ntuples_tuplesort++;
+			}
 			else
 			{
 				/*
@@ -565,6 +575,7 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 				 * respill) and stop spilling.
 				 */
 				tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+				node->bs_stats.ntuples_spilled++;
 			}
 		}
 
@@ -633,7 +644,11 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 									  &node->bs_sortsupport);
 
 		if (cmp <= 0)
+		{
 			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+			node->bs_stats.ntuples_tuplesort_all++;
+			node->bs_stats.ntuples_tuplesort++;
+		}
 		else
 		{
 			/*
@@ -644,6 +659,7 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 			 * respill) and stop spilling.
 			 */
 			tuplestore_puttupleslot(tupstore, slot);
+			node->bs_stats.ntuples_respilled++;
 		}
 	}
 
@@ -933,23 +949,40 @@ IndexNext(BrinSortState *node)
 					 */
 					if (node->bs_tuplesortstate)
 					{
-#ifdef DEBUG_BRIN_SORT
+						TuplesortInstrumentation stats;
+
+						/*
+						 * Reset tuplesort statistics between runs, otherwise
+						 * we'll keep re-using stats from the largest run.
+						 */
 						tuplesort_reset_stats(node->bs_tuplesortstate);
-#endif
 
 						tuplesort_performsort(node->bs_tuplesortstate);
 
-#ifdef DEBUG_BRIN_SORT
-						if (debug_brin_sort)
-						{
-							TuplesortInstrumentation stats;
+						node->bs_stats.sort_count++;
+						node->bs_stats.ntuples_tuplesort = 0;
 
-							memset(&stats, 0, sizeof(TuplesortInstrumentation));
-							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
 
-							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+						if (stats.spaceType == SORT_SPACE_TYPE_DISK)
+						{
+							node->bs_stats.sort_count_on_disk++;
+							node->bs_stats.total_space_used_on_disk += stats.spaceUsed;
+							node->bs_stats.max_space_used_on_disk = Max(node->bs_stats.max_space_used_on_disk,
+																		stats.spaceUsed);
+						}
+						else if (stats.spaceType == SORT_SPACE_TYPE_MEMORY)
+						{
+							node->bs_stats.sort_count_in_memory++;
+							node->bs_stats.total_space_used_in_memory += stats.spaceUsed;
+							node->bs_stats.max_space_used_in_memory = Max(node->bs_stats.max_space_used_in_memory,
+																		  stats.spaceUsed);
+						}
 
-							elog(WARNING, "method: %s  space: %ld kB (%s)",
+#ifdef DEBUG_BRIN_SORT
+						if (debug_brin_sort)
+						{
+							elog(WARNING, "method: %s  space: " INT64_FORMAT " kB (%s)",
 								 tuplesort_method_name(stats.sortMethod),
 								 stats.spaceUsed,
 								 tuplesort_space_type_name(stats.spaceType));
@@ -1219,9 +1252,10 @@ ExecEndBrinSort(BrinSortState *node)
 		tuplesort_end(node->bs_tuplesortstate);
 	node->bs_tuplesortstate = NULL;
 
-	if (node->bs_scan->ranges != NULL)
+	if (node->bs_scan != NULL &&
+		node->bs_scan->ranges != NULL)
 		tuplesort_end(node->bs_scan->ranges);
-	node->bs_scan->ranges = NULL;
+	node->bs_scan = NULL;
 }
 
 /* ----------------------------------------------------------------
@@ -1311,6 +1345,7 @@ ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
 	FmgrInfo   *rangeproc;
 	BrinRangeScanDesc *brscan;
 	bool		asc;
+	TimestampTz	start_ts;
 
 	/* BRIN Sort only allows ORDER BY using a single column */
 	Assert(node->numCols == 1);
@@ -1355,15 +1390,18 @@ ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
 
 	/*
 	 * Ask the opclass to produce ranges in appropriate ordering.
-	 *
-	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
 	 */
+	start_ts = GetCurrentTimestamp();
+
 	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
 											node->collations[0],
 											PointerGetDatum(scan),
 											Int16GetDatum(attno),
 											BoolGetDatum(asc)));
 
+	planstate->bs_stats.ranges_build_ms
+		= TimestampDifferenceMilliseconds(start_ts, GetCurrentTimestamp());
+
 	/* allocate for space, and also for the alternative ordering */
 	planstate->bs_scan = brscan;
 }
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index efe26938d0a..2a98286e11a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1628,6 +1628,44 @@ typedef struct BrinRanges
 	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
 } BrinRanges;
 
+typedef struct BrinSortStats
+{
+	/* number of sorts */
+	int64	sort_count;
+
+	/* number of ranges loaded */
+	int64	range_count;
+
+	/* tuples in the current tuplesort */
+	int64	ntuples_tuplesort;
+
+	/* tuples written directly to tuplesort */
+	int64	ntuples_tuplesort_direct;
+
+	/* tuples written to tuplesort (all) */
+	int64	ntuples_tuplesort_all;
+
+	/* tuples written to tuplestore */
+	int64	ntuples_spilled;
+
+	/* tuples copied from old to new tuplestore */
+	int64	ntuples_respilled;
+
+	/* number of in-memory/on-disk sorts */
+	int64	sort_count_in_memory;
+	int64	sort_count_on_disk;
+
+	/* total/maximum amount of space used by either sort */
+	int64	total_space_used_in_memory;
+	int64	total_space_used_on_disk;
+	int64	max_space_used_in_memory;
+	int64	max_space_used_on_disk;
+
+	/* time to build ranges (milliseconds) */
+	int64	ranges_build_ms;
+
+} BrinSortStats;
+
 typedef struct BrinSortState
 {
 	ScanState	ss;				/* its first field is NodeTag */
@@ -1664,6 +1702,7 @@ typedef struct BrinSortState
 	BrinSortPhase	bs_phase;
 	SortSupportData	bs_sortsupport;
 	ProjectionInfo *bs_ProjInfo;
+	BrinSortStats	bs_stats;
 
 	/*
 	 * We need two tuplesort instances - one for current range, one for
-- 
2.39.2

0006-wip-multiple-watermark-steps-20230225.patchtext/x-patch; charset=UTF-8; name=0006-wip-multiple-watermark-steps-20230225.patchDownload

From 66d050327f67adcb17a866aeb2222bf9614f1d6c Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 20 Oct 2022 13:03:00 +0200
Subject: [PATCH 06/11] wip: multiple watermark steps

Allow incrementing the minval watermark faster, by skipping some minval
values. This allows sorting more data at once (instead of many tiny
sorts, which is inefficient). This also reduces the number of rows we
need to spill (and possibly transfer multiple times).

To use a different watermark step, use a new GUC:

  SET brinsort_watermark_step = 16
---
 src/backend/commands/explain.c      |  3 ++
 src/backend/executor/nodeBrinSort.c | 59 ++++++++++++++++++++++++++---
 src/backend/utils/misc/guc_tables.c | 12 ++++++
 3 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1305083f1d3..a89ee03857d 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -47,6 +47,7 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
 /* Hook for plugins to get control in explain_get_index_name() */
 explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
 
+extern int brinsort_watermark_step;
 
 /* OR-able flags for ExplainXMLTag() */
 #define X_OPENING 0
@@ -2457,6 +2458,8 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 						 tuplesort_space_type_name(stats.spaceType));
 	}
 
+	ExplainPropertyInteger("Step", NULL, (int64) brinsort_watermark_step, es);
+
 	if (stats->sort_count > 0)
 	{
 		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 614d28c83b1..15d15797735 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -248,6 +248,14 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 1;
+
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -351,11 +359,24 @@ brinsort_end_tidscan(BrinSortState *node)
  * a separate "first" parameter - "set=false" has the same meaning.
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool asc)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
 {
 	int		cmp;
+
+	/* assume we haven't found a watermark */
 	bool	found = false;
 
+	Assert(steps > 0);
+
+	/*
+	 * If the watermark is empty, either this is the first call (in
+	 * which case we just use the first (or rather second) value.
+	 * Otherwise it means we've reached the end, so no point in looking
+	 * for more watermarks.
+	 */
+	if (node->bs_watermark_empty && !first)
+		return;
+
 	tuplesort_markpos(node->bs_scan->ranges);
 
 	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
@@ -381,22 +402,48 @@ brinsort_update_watermark(BrinSortState *node, bool asc)
 		else
 			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
 
+		/*
+		 * Has to be the first call (otherwise we would not get here, because we
+		 * terminate after bs_watermark_set gets flipped back to false), so we
+		 * just set the value. But we don't count this as a step, because that
+		 * just picks the first minval value, as we certainly need to do at least
+		 * one more step.
+		 *
+		 * XXX Actually, do we need to make another step? Maybe there are enough
+		 * not-summarized ranges? Although, we don't know what values are in
+		 * those, ranges, and with increasing data we might easily end up just
+		 * writing all of it into the spill tuplestore. So making one more step
+		 * seems like a better idea - we'll at least be able to produce something
+		 * which is good for LIMIT queries.
+		 */
 		if (!node->bs_watermark_set)
 		{
+			Assert(first);
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
+			found = true;
 			continue;
 		}
 
 		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
 								  &node->bs_sortsupport);
 
-		if (cmp < 0)
+		/*
+		 * Values should not decrease (or whatever the operator says, might
+		 * be a DESC sort).
+		 */
+		Assert(cmp <= 0);
+
+		if (cmp < 0)	/* new watermark value */
 		{
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
 			found = true;
-			break;
+
+			steps--;
+
+			if (steps == 0)
+				break;
 		}
 	}
 
@@ -913,7 +960,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -1034,7 +1081,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -1059,7 +1106,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, asc);
+								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 0ee5f8784a1..fe8f9c55799 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -96,6 +96,7 @@ extern char *temp_tablespaces;
 extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
 extern bool synchronize_seqscans;
+extern int	brinsort_watermark_step;
 
 #ifdef DEBUG_BRIN_STATS
 extern bool debug_brin_stats;
@@ -3534,6 +3535,17 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"brinsort_watermark_step", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("sets the step for brinsort watermark increments"),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&brinsort_watermark_step,
+		1, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
-- 
2.39.2

0007-wip-adjust-watermark-step-20230225.patchtext/x-patch; charset=UTF-8; name=0007-wip-adjust-watermark-step-20230225.patchDownload

From 0de44f8226e9f8c56942b1b4aa8ea9fc2faf77db Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 00:06:28 +0200
Subject: [PATCH 07/11] wip: adjust watermark step

Look at available statistics - number of possible watermark values,
number of rows, work_mem, etc. and pick a good watermark_step value.

To calculate step using statistics, set the GUC to 0:

   SET brinsort_watermark_step = 0;
---
 src/backend/commands/explain.c          |  6 +++
 src/backend/executor/nodeBrinSort.c     | 21 ++++----
 src/backend/optimizer/plan/createplan.c | 70 +++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c     |  2 +-
 src/include/nodes/execnodes.h           |  5 ++
 src/include/nodes/plannodes.h           |  3 ++
 6 files changed, 94 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a89ee03857d..7cf42a7649f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2440,6 +2440,7 @@ static void
 show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 {
 	BrinSortStats  *stats = &sortstate->bs_stats;
+	BrinSort   *plan = (BrinSort *) sortstate->ss.ps.plan;
 
 	if (sortstate->bs_scan != NULL &&
 		sortstate->bs_scan->ranges != NULL)
@@ -2462,6 +2463,9 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 
 	if (stats->sort_count > 0)
 	{
+		ExplainPropertyInteger("Average Step", NULL, (int64)
+							   stats->watermark_updates_steps / stats->watermark_updates_count, es);
+
 		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
 							   stats->range_count, es);
 
@@ -2503,6 +2507,8 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 			ExplainCloseGroup("Sorts", "Sorts", true, es);
 		}
 	}
+	else
+		ExplainPropertyInteger("Initial Step", NULL, (int64) plan->watermark_step, es);
 
 	if (stats->sort_count_in_memory > 0)
 	{
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 15d15797735..f8356202b77 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -248,14 +248,6 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
-/*
- * How many distinct minval values to look forward for the next watermark?
- *
- * The smallest step we can do is 1, which means the immediately following
- * (while distinct) minval.
- */
-int brinsort_watermark_step = 1;
-
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -359,9 +351,11 @@ brinsort_end_tidscan(BrinSortState *node)
  * a separate "first" parameter - "set=false" has the same meaning.
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc)
 {
 	int		cmp;
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	int			steps = plan->watermark_step;
 
 	/* assume we haven't found a watermark */
 	bool	found = false;
@@ -449,6 +443,9 @@ brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
 
 	tuplesort_restorepos(node->bs_scan->ranges);
 
+	node->bs_stats.watermark_updates_count++;
+	node->bs_stats.watermark_updates_steps += plan->watermark_step;
+
 	node->bs_watermark_empty = (!found);
 }
 
@@ -960,7 +957,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, true, asc);
 				}
 
 				break;
@@ -1081,7 +1078,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, false, asc);
 				}
 
 				break;
@@ -1106,7 +1103,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+								brinsort_update_watermark(node, true, asc);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index cd2935fa011..c88337bd310 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -18,6 +18,7 @@
 
 #include <math.h>
 
+#include "access/brin.h"
 #include "access/genam.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
@@ -323,6 +324,14 @@ static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
 											 GatherMergePath *best_path);
 
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 0;
+
 /*
  * create_plan
  *	  Creates the access plan for a query by recursively processing the
@@ -3416,6 +3425,67 @@ create_brinsort_plan(PlannerInfo *root,
 
 	index_close(indexRel, NoLock);
 
+	/*
+	 * determine watermark step (how fast to advance)
+	 *
+	 * If the brinsort_watermark_step is set to a non-zero value, we just use
+	 * that value directly. Otherwise we pick a value using some simple
+	 * heuristics - we don't want the rows to exceed work_mem, and we leave
+	 * a bit slack (because we're adding batches of rows, not row by row).
+	 *
+	 * This has a weakness, because it assumes we incrementally add the same
+	 * number of rows into the "sort" set - but imagine very wide overlapping
+	 * ranges (e.g. random data on the same domain). Most of them will have
+	 * about the same minval, so the sort grows only very slowly. Until the
+	 * very last range, that removes the watermark and only then do most of
+	 * the rows get to the tuplesort.
+	 *
+	 * XXX But maybe we can look at the other statistics we have, like number
+	 * of overlaps and average range selectivity (% of tuples matching), and
+	 * deduce something from that?
+	 *
+	 * XXX Could we maybe adjust the watermark step adaptively at runtime?
+	 * That is, when we get to the "sort" step, maybe check how many rows
+	 * are there, and if there are only few then try increasing the step?
+	 */
+	brinsort_plan->watermark_step = brinsort_watermark_step;
+
+	if (brinsort_plan->watermark_step == 0)
+	{
+		BrinMinmaxStats *amstats;
+
+		/**/
+		Cardinality		rows = brinsort_plan->scan.plan.plan_rows;
+
+		/* estimate rowsize in the tuplesort */
+		int				width = brinsort_plan->scan.plan.plan_width;
+		int				tupwidth = (MAXALIGN(width) + MAXALIGN(SizeofHeapTupleHeader));
+
+		/* Don't overflow work_mem (use only half to absorb variations. */
+		int				maxrows = (work_mem * 1024L / tupwidth / 2);
+
+		/* If this is a LIMIT query, aim only for the required number of rows. */
+		if (root->limit_tuples > 0)
+			maxrows = Min(maxrows, root->limit_tuples);
+
+		/* Use the attnum calculated above. */
+		amstats = (BrinMinmaxStats *) get_attindexam(brinsort_plan->indexid,
+													 brinsort_plan->attnums[0]);
+
+		if (amstats)
+		{
+			double	pct_per_step = Max(amstats->minval_increment_avg,
+									   amstats->maxval_increment_avg);
+			double	rows_per_step = Max(1.0, pct_per_step * rows);
+
+			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
+		}
+
+		/* some rough safety estimates */
+		brinsort_plan->watermark_step = Max(brinsort_plan->watermark_step, 1);
+		brinsort_plan->watermark_step = Min(brinsort_plan->watermark_step, 8192);
+	}
+
 	return brinsort_plan;
 }
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index fe8f9c55799..63b164edbeb 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3542,7 +3542,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_NOT_IN_SAMPLE
 		},
 		&brinsort_watermark_step,
-		1, 1, INT_MAX,
+		0, 0, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 2a98286e11a..06dc6416d99 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1664,6 +1664,10 @@ typedef struct BrinSortStats
 	/* time to build ranges (milliseconds) */
 	int64	ranges_build_ms;
 
+	/* number/sum of watermark update steps */
+	int64	watermark_updates_steps;
+	int64	watermark_updates_count;
+
 } BrinSortStats;
 
 typedef struct BrinSortState
@@ -1696,6 +1700,7 @@ typedef struct BrinSortState
 	BrinRangeScanDesc *bs_scan;
 	BrinRange	   *bs_range;
 	ExprState	   *bs_qual;
+	int				bs_watermark_step;
 	Datum			bs_watermark;
 	bool			bs_watermark_set;
 	bool			bs_watermark_empty;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 341dfc57826..659a6d110ee 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -531,6 +531,9 @@ typedef struct BrinSort
 	/* NULLS FIRST/LAST directions */
 	bool	   *nullsFirst pg_node_attr(array_size(numCols));
 
+	/* number of watermark steps to make */
+	int			watermark_step;
+
 } BrinSort;
 
 /* ----------------
-- 
2.39.2

0008-wip-adaptive-watermark-step-20230225.patchtext/x-patch; charset=UTF-8; name=0008-wip-adaptive-watermark-step-20230225.patchDownload

From 0056ced7353926c419fa9235d973fecfa6538382 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 01:39:39 +0200
Subject: [PATCH 08/11] wip: adaptive watermark step

Another option it to adjust the watermark step based on past tuplesort
executions, and either increase or decrease the step, based on whether
the sort was in-memory or on-disk, etc.

To do this, set the GUC to -1:

  SET brinsort_watermark_step = -1;
---
 src/backend/access/brin/brin_minmax.c   |   7 +-
 src/backend/executor/nodeBrinSort.c     | 189 +++++++++++++++++++++++-
 src/backend/optimizer/plan/createplan.c |  21 +--
 src/backend/utils/misc/guc_tables.c     |   2 +-
 src/include/nodes/execnodes.h           |   2 +-
 src/include/nodes/plannodes.h           |   3 +-
 6 files changed, 206 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 6b08d8b288b..85f8992f878 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -47,9 +47,6 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
 
-/* print info about ranges */
-#define BRINSORT_DEBUG
-
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1995,7 +1992,7 @@ brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan, TupleTableSlot *slot,
 	scan->nranges++;
 }
 
-#ifdef BRINSORT_DEBUG
+#ifdef BRIN_SORT_DEBUG
 /*
  * brin_minmax_scan_next
  *		Return the next BRIN range information from the tuplestore.
@@ -2204,7 +2201,7 @@ brin_minmax_ranges(PG_FUNCTION_ARGS)
 	/* do the sort and any necessary post-processing */
 	brin_minmax_scan_finalize(brscan);
 
-#ifdef BRINSORT_DEBUG
+#ifdef BRIN_SORT_DEBUG
 	brin_minmax_scan_dump(brscan);
 #endif
 
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index f8356202b77..08507f2b5d9 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -218,6 +218,8 @@
  *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
  *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
  */
+#include <math.h>
+
 #include "postgres.h"
 
 #include "access/brin.h"
@@ -248,6 +250,14 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int	brinsort_watermark_step = 0;
+
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -859,6 +869,175 @@ brinsort_rescan(BrinSortState *node)
 	tuplesort_rescan(node->bs_scan->ranges);
 }
 
+/*
+ * Look at the tuplesort statistics, and maybe increase or decrease the
+ * watermark step. If the last sort was in-memory, we decrease the step.
+ * If the sort was in-memory, but we used less than work_mem/3, increment
+ * the step value.
+ *
+ * XXX This should probably behave differently for LIMIT queries, so that
+ * we don't load too many rows unnecessarily. We already consider that in
+ * create_brinsort_plan, but maybe we should limit increments to the step
+ * value here too - say, by tracking how many rows are we supposed to
+ * produce, and limiting the watermark so that we don't process too many
+ * rows in future steps.
+ *
+ * XXX We might also track the number of rows in the sort and space used,
+ * to calculate more accurate estimate of row width. And then use that to
+ * calculate number of rows that fit into work_mem. But the number of rows
+ * that go into tuplesort (per range) added would still remain fairly
+ * inaccurate, so not sure how good this woud be.
+ */
+static void
+brinsort_adjust_watermark_step(BrinSortState *node, TuplesortInstrumentation *stats)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+
+	if (brinsort_watermark_step != -1)
+		return;
+
+	if (stats->spaceType == SORT_SPACE_TYPE_DISK)
+	{
+		/*
+		 * We don't know how much to decrease the step (hard to estimate
+		 * due to space needed for in-memory and on-disk sorts is not
+		 * easily comparable, so we just cut the step in half. For the
+		 * in-memory sort, we then can do better estimate and increase
+		 * the step more accurately.
+		 */
+		plan->watermark_step = Max(1, plan->watermark_step / 2);
+	}
+	else
+	{
+		/*
+		 * Adjust the step based on the last sort - we shoot for 2/3 of
+		 * work_mem, to keep some slack (and not switch to on-disk sort
+		 * due to minor differences). We calculate the average row width
+		 * using space used and number of rows in the tuplesort, number
+		 * of rows we could fit into work_mem, and how many steps would
+		 * that mean (assuming number of rows is proportional to the
+		 * number of steps).
+		 *
+		 * We need to be careful about the number of rows we're supposed
+		 * to produce (and how many we already produced). Consider for
+		 * example a query with LIMIT 1000, and that we produce 999 rows
+		 * in the first sort, so that we need only 1 more row. It would
+		 * be silly to pick the steps with the goal to "fill work_mem"
+		 * instead of just enough to produce the one row.
+		 *
+		 * XXX In principle, we don't know how many rows will need to be
+		 * read from the table - there may be interesting rows already in
+		 * the tuplestore (in which case we could do a smaller step). But
+		 * we don't know how many such rows are there - maybe if we had
+		 * multiple smaller tuplestores, which would also reduce the
+		 * amount of "respill" we need to do.
+		 */
+		int		nrows_remaining;
+		int		step = plan->watermark_step;
+		int		step_max = plan->watermark_step * 2;
+
+		/* number of remaining rows we're expected to produce */
+		nrows_remaining = Max(1.0, plan->step_maxrows - node->bs_stats.ntuples_tuplesort_all);
+
+		/*
+		 * If we sorted any rows, calculate how many similar rows we can fit
+		 * into work_mem. We restrict ourselves to 2/3 of work_mem, to leave
+		 * a bit of slack space.
+		 *
+		 * XXX Hopefully the average width is somewhat accurate, but maybe
+		 * we should remember the width we originally expected, and combine
+		 * that somehow. Maybe we should not use just the last tuplesort,
+		 * but instead accumulate average from all preceding sorts and
+		 * combine them somehow (say, using weighted average with older
+		 * values having less influence).
+		 */
+		if (node->bs_stats.ntuples_tuplesort > 0)
+		{
+			int		nrows_wmem;
+			int		avgwidth;
+
+			/* average tuple width, calculated from last sort */
+			avgwidth = (stats->spaceUsed * 1024L / node->bs_stats.ntuples_tuplesort);
+
+			/*
+			 * Calculate the numer of rows to fit into 2/3 of work_mem, but
+			 * cap to the number of rows we're expected to produce.
+			 */
+			nrows_wmem = Min(nrows_remaining, (2 * 1024L * work_mem / 3) / avgwidth);
+
+			/* scale the number of steps to produce the number of rows */
+			step = step * ((double) (nrows_wmem * avgwidth) / (stats->spaceUsed * 1024L));
+
+			/* remember this as the max, so that we don't overflow work_mem */
+			step_max = Min(step, step_max);
+
+			/* however, make sure we don't grow too fast - cap to 2x */
+			step = Min(step, step_max);
+		}
+
+		/*
+		 * Now calculate average step size using data from all sorts we did
+		 * up to now. Then we calculate the number of steps we expect to be
+		 * necessary.
+		 *
+		 * If we had calculated average number of rows per step from AM stats,
+		 * consider that too. It's possible the batch had just one row, which
+		 * might result in very high estimate of steps - it'd be silly to
+		 * jump e.g. from 1 to 1000 based on this unreliable statistics. To
+		 * prevent that, we combine the two rows_per_step sources as weighted
+		 * sum, using the observed vs. target number of rows as weight. The
+		 * closer we're to the target, the more reliable value from past
+		 * executions is.
+		 *
+		 * But we don't want to overflow work_mem, so cap by step_max.
+		 */
+		if (node->bs_stats.ntuples_tuplesort_all > 0)
+		{
+			double		rows_per_step;
+
+			/* average number of rows we produced per step so far */
+			rows_per_step = (double) node->bs_stats.ntuples_tuplesort_all / node->bs_stats.watermark_updates_steps;
+
+			/*
+			 * If we have AM stats with average number of rows per step, consider
+			 * that too - approximate depending on what fraction of rows we already
+			 * produced (with higher fraction of rows produced we prefer the local
+			 * average, as opposed to the global average from index AM stats).
+			 */
+			if (plan->rows_per_step > 0)
+			{
+				/* number of rows we already produced (as a fraction) */
+				double weight = (double) node->bs_stats.ntuples_tuplesort_all / plan->step_maxrows;
+
+				/* paranoia */
+				weight = Min(1.0, weight);
+
+				/*
+				 * Approximate between index AM and "local" average calculated
+				 * from past executions. The closer we get to target rows, the
+				 * more we ignore the index AM stats.
+				 */
+				rows_per_step = weight * rows_per_step + (1 - weight) * plan->rows_per_step;
+			}
+
+			/* approximate the steps between */
+			step = Max(step, ceil((double) nrows_remaining / rows_per_step));
+
+			/*
+			 * But don't overflow the current max (which is set either
+			 * as 2x starting value, or from work_mem.
+			 */
+			step = Min(step, step_max);
+		}
+
+		plan->watermark_step = step;
+
+	}
+
+	plan->watermark_step = Max(1, plan->watermark_step);
+	plan->watermark_step = Min(8192, plan->watermark_step);
+}
+
 /* ----------------------------------------------------------------
  *		IndexNext
  *
@@ -997,13 +1176,21 @@ IndexNext(BrinSortState *node)
 
 						/*
 						 * Reset tuplesort statistics between runs, otherwise
-						 * we'll keep re-using stats from the largest run.
+						 * we'll keep re-using stats from the largest run, which
+						 * would then confuse the adaptive adjustment of the
+						 * watermark step.
 						 */
 						tuplesort_reset_stats(node->bs_tuplesortstate);
 
 						tuplesort_performsort(node->bs_tuplesortstate);
 
 						node->bs_stats.sort_count++;
+
+						memset(&stats, 0, sizeof(TuplesortInstrumentation));
+						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+						brinsort_adjust_watermark_step(node, &stats);
+
 						node->bs_stats.ntuples_tuplesort = 0;
 
 						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c88337bd310..c9c47f24e3d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -324,13 +324,8 @@ static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
 											 GatherMergePath *best_path);
 
 
-/*
- * How many distinct minval values to look forward for the next watermark?
- *
- * The smallest step we can do is 1, which means the immediately following
- * (while distinct) minval.
- */
-int brinsort_watermark_step = 0;
+/* defined in nodeBrinSort.c */
+extern int brinsort_watermark_step;
 
 /*
  * create_plan
@@ -3449,8 +3444,14 @@ create_brinsort_plan(PlannerInfo *root,
 	 * are there, and if there are only few then try increasing the step?
 	 */
 	brinsort_plan->watermark_step = brinsort_watermark_step;
+	brinsort_plan->rows_per_step = -1;
 
-	if (brinsort_plan->watermark_step == 0)
+	if (root->limit_tuples > 0)
+		brinsort_plan->step_maxrows = root->limit_tuples;
+	else
+		brinsort_plan->step_maxrows = brinsort_plan->scan.plan.plan_rows;
+
+	if (brinsort_plan->watermark_step <= 0)
 	{
 		BrinMinmaxStats *amstats;
 
@@ -3478,7 +3479,9 @@ create_brinsort_plan(PlannerInfo *root,
 									   amstats->maxval_increment_avg);
 			double	rows_per_step = Max(1.0, pct_per_step * rows);
 
-			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
+			brinsort_plan->rows_per_step = rows_per_step;
+
+			brinsort_plan->watermark_step = (int) ceil(maxrows / rows_per_step);
 		}
 
 		/* some rough safety estimates */
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 63b164edbeb..b1d7879f028 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3542,7 +3542,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_NOT_IN_SAMPLE
 		},
 		&brinsort_watermark_step,
-		0, 0, INT_MAX,
+		0, -1, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 06dc6416d99..a3059314054 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1713,7 +1713,7 @@ typedef struct BrinSortState
 	 * We need two tuplesort instances - one for current range, one for
 	 * spill-over tuples from the overlapping ranges
 	 */
-	void		   *bs_tuplesortstate;
+	Tuplesortstate  *bs_tuplesortstate;
 	Tuplestorestate *bs_tuplestore;
 } BrinSortState;
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 659a6d110ee..b0cff1d02d2 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -533,7 +533,8 @@ typedef struct BrinSort
 
 	/* number of watermark steps to make */
 	int			watermark_step;
-
+	int			step_maxrows;
+	int			rows_per_step;
 } BrinSort;
 
 /* ----------------
-- 
2.39.2

0009-wip-add-brinsort-regression-tests-20230225.patchtext/x-patch; charset=UTF-8; name=0009-wip-add-brinsort-regression-tests-20230225.patchDownload

From 3074915222f2941ca7b5de2ccb48c8cda30f6cce Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 3 Feb 2023 13:19:24 +0100
Subject: [PATCH 09/11] wip: add brinsort regression tests

---
 src/test/regress/expected/brin_sort.out       | 543 ++++++++++++
 src/test/regress/expected/brin_sort_exprs.out | 788 ++++++++++++++++++
 src/test/regress/expected/brin_sort_multi.out | 545 ++++++++++++
 .../expected/brin_sort_multi_exprs.out        | 784 +++++++++++++++++
 src/test/regress/parallel_schedule            |   6 +
 src/test/regress/sql/brin_sort.sql            | 238 ++++++
 src/test/regress/sql/brin_sort_exprs.sql      | 373 +++++++++
 src/test/regress/sql/brin_sort_multi.sql      | 235 ++++++
 .../regress/sql/brin_sort_multi_exprs.sql     | 369 ++++++++
 9 files changed, 3881 insertions(+)
 create mode 100644 src/test/regress/expected/brin_sort.out
 create mode 100644 src/test/regress/expected/brin_sort_exprs.out
 create mode 100644 src/test/regress/expected/brin_sort_multi.out
 create mode 100644 src/test/regress/expected/brin_sort_multi_exprs.out
 create mode 100644 src/test/regress/sql/brin_sort.sql
 create mode 100644 src/test/regress/sql/brin_sort_exprs.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi_exprs.sql

diff --git a/src/test/regress/expected/brin_sort.out b/src/test/regress/expected/brin_sort.out
new file mode 100644
index 00000000000..0e207cf4f6f
--- /dev/null
+++ b/src/test/regress/expected/brin_sort.out
@@ -0,0 +1,543 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_exprs.out b/src/test/regress/expected/brin_sort_exprs.out
new file mode 100644
index 00000000000..87e58a10883
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_exprs.out
@@ -0,0 +1,788 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi.out b/src/test/regress/expected/brin_sort_multi.out
new file mode 100644
index 00000000000..22fa8331d38
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi.out
@@ -0,0 +1,545 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi_exprs.out b/src/test/regress/expected/brin_sort_multi_exprs.out
new file mode 100644
index 00000000000..0e9f4ea3182
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi_exprs.out
@@ -0,0 +1,784 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 15e015b3d64..39af15bb5d8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -131,3 +131,9 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# try sorting using BRIN index
+test: brin_sort
+test: brin_sort_multi
+test: brin_sort_exprs
+test: brin_sort_multi_exprs
diff --git a/src/test/regress/sql/brin_sort.sql b/src/test/regress/sql/brin_sort.sql
new file mode 100644
index 00000000000..f4458bdc386
--- /dev/null
+++ b/src/test/regress/sql/brin_sort.sql
@@ -0,0 +1,238 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_exprs.sql b/src/test/regress/sql/brin_sort_exprs.sql
new file mode 100644
index 00000000000..de1f1e7d8fe
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_exprs.sql
@@ -0,0 +1,373 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi.sql b/src/test/regress/sql/brin_sort_multi.sql
new file mode 100644
index 00000000000..d0544ad7069
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi.sql
@@ -0,0 +1,235 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi_exprs.sql b/src/test/regress/sql/brin_sort_multi_exprs.sql
new file mode 100644
index 00000000000..299c7979326
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi_exprs.sql
@@ -0,0 +1,369 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
-- 
2.39.2

0010-wip-add-brinsort-amstats-regression-tests-20230225.patchtext/x-patch; charset=UTF-8; name=0010-wip-add-brinsort-amstats-regression-tests-20230225.patchDownload

From ff5825a00e0cd6bdb36f6177f78ad5cb056ea813 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 18 Feb 2023 16:41:33 +0100
Subject: [PATCH 10/11] wip: add brinsort amstats regression tests

---
 .../regress/expected/brin_sort_amstats.out    | 544 ++++++++++++
 .../expected/brin_sort_exprs_amstats.out      | 789 ++++++++++++++++++
 .../expected/brin_sort_multi_amstats.out      | 546 ++++++++++++
 .../brin_sort_multi_exprs_amstats.out         | 785 +++++++++++++++++
 src/test/regress/parallel_schedule            |   6 +
 src/test/regress/sql/brin_sort_amstats.sql    | 240 ++++++
 .../regress/sql/brin_sort_exprs_amstats.sql   | 375 +++++++++
 .../regress/sql/brin_sort_multi_amstats.sql   | 237 ++++++
 .../sql/brin_sort_multi_exprs_amstats.sql     | 371 ++++++++
 9 files changed, 3893 insertions(+)
 create mode 100644 src/test/regress/expected/brin_sort_amstats.out
 create mode 100644 src/test/regress/expected/brin_sort_exprs_amstats.out
 create mode 100644 src/test/regress/expected/brin_sort_multi_amstats.out
 create mode 100644 src/test/regress/expected/brin_sort_multi_exprs_amstats.out
 create mode 100644 src/test/regress/sql/brin_sort_amstats.sql
 create mode 100644 src/test/regress/sql/brin_sort_exprs_amstats.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi_amstats.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi_exprs_amstats.sql

diff --git a/src/test/regress/expected/brin_sort_amstats.out b/src/test/regress/expected/brin_sort_amstats.out
new file mode 100644
index 00000000000..d1dde8cf7d5
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_amstats.out
@@ -0,0 +1,544 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_exprs_amstats.out b/src/test/regress/expected/brin_sort_exprs_amstats.out
new file mode 100644
index 00000000000..1477379c46b
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_exprs_amstats.out
@@ -0,0 +1,789 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi_amstats.out b/src/test/regress/expected/brin_sort_multi_amstats.out
new file mode 100644
index 00000000000..ccf9fcee481
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi_amstats.out
@@ -0,0 +1,546 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+-- 
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi_exprs_amstats.out b/src/test/regress/expected/brin_sort_multi_exprs_amstats.out
new file mode 100644
index 00000000000..21593bc7e6c
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi_exprs_amstats.out
@@ -0,0 +1,785 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 39af15bb5d8..5f224ad495e 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -137,3 +137,9 @@ test: brin_sort
 test: brin_sort_multi
 test: brin_sort_exprs
 test: brin_sort_multi_exprs
+
+# try sorting using BRIN index with indexam stats
+test: brin_sort_amstats
+test: brin_sort_multi_amstats
+test: brin_sort_exprs_amstats
+test: brin_sort_multi_exprs_amstats
diff --git a/src/test/regress/sql/brin_sort_amstats.sql b/src/test/regress/sql/brin_sort_amstats.sql
new file mode 100644
index 00000000000..8c16c6a3ce9
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_amstats.sql
@@ -0,0 +1,240 @@
+set enable_indexam_stats = true;
+
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_exprs_amstats.sql b/src/test/regress/sql/brin_sort_exprs_amstats.sql
new file mode 100644
index 00000000000..3a22051424f
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_exprs_amstats.sql
@@ -0,0 +1,375 @@
+set enable_indexam_stats = true;
+
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi_amstats.sql b/src/test/regress/sql/brin_sort_multi_amstats.sql
new file mode 100644
index 00000000000..36762f316e4
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi_amstats.sql
@@ -0,0 +1,237 @@
+set enable_indexam_stats = true;
+
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+
+-- 
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+ 
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi_exprs_amstats.sql b/src/test/regress/sql/brin_sort_multi_exprs_amstats.sql
new file mode 100644
index 00000000000..2fd887fcc09
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi_exprs_amstats.sql
@@ -0,0 +1,371 @@
+set enable_indexam_stats = true;
+
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+ 
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
-- 
2.39.2

0011-wip-test-generator-script-20230225.patchtext/x-patch; charset=UTF-8; name=0011-wip-test-generator-script-20230225.patchDownload

From a847384131c6d929a218684ed1a8d63580079e96 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 6 Feb 2023 03:42:52 +0100
Subject: [PATCH 11/11] wip: test generator script

---
 brin-test.py | 386 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 386 insertions(+)
 create mode 100644 brin-test.py

diff --git a/brin-test.py b/brin-test.py
new file mode 100644
index 00000000000..c90ec798bea
--- /dev/null
+++ b/brin-test.py
@@ -0,0 +1,386 @@
+import psycopg2
+import psycopg2.extras
+import random
+import sys
+import time
+import re
+
+from datetime import datetime
+from statistics import mean
+
+cols = [('int_val', 'int4_minmax_ops'),
+		('bigint_val', 'int8_minmax_ops'),
+		('text_val', 'text_minmax_ops'),
+		('inet_val', 'inet_minmax_ops'),
+		('(int_val+1)', 'int4_minmax_ops'),
+		('(bigint_val+1)', 'int8_minmax_ops'),
+		("('x' || text_val)", 'text_minmax_ops'),
+		('(inet_val + 1)', 'inet_minmax_ops'),
+		('(int_val+2)', 'int4_minmax_ops'),
+		('(bigint_val+2)', 'int8_minmax_ops'),
+		("('y' || text_val)", 'text_minmax_ops'),
+		('(inet_val + 2)', 'inet_minmax_ops')]
+
+# randomly reorder the table columns
+#table_cols = [('int_val int', 'i', 'i + %(skew)d * random()', 'i + 1000000 * random()'),
+#			  ('bigint_val bigint', '-i', '-i - 100 * random()', '-1 - 1000000 * random()'),
+#			  ('inet_val inet', "'10.0.0.0'::inet + i", "'10.0.0.0'::inet + i * 100 * random()::int", "'10.0.0.0'::inet + i + 1000000 * random()::int"),
+#			  ('text_val text', "lpad(i::text || md5(i::text), 40, '0')", "lpad((i + 100*random()::int)::text || md5(i::text), 40, '0')", "lpad((i + 1000000*random()::int)::text || md5(i::text), 40, '0')")]
+
+table_cols = [('int_val int', 'i + %(randomness)d * random()'),
+			  ('bigint_val bigint', '-i - %(randomness)d * random()'),
+			  ('inet_val inet', "'10.0.0.0'::inet + i + %(randomness)d * random()::int"),
+			  ('text_val text', "lpad((i + %(randomness)d * random()::int)::text || md5(i::text), 40, '0')")]
+
+
+def execute_query(cur, query, fetch_result = False):
+
+	cur.execute(query)
+
+	if fetch_result:
+		return cur.fetchall()
+
+
+# recreate the table with the columns in randomized order
+def recreate_table(conn, nrows, randomness, fillfactor):
+
+	random.shuffle(table_cols)
+
+	cur = conn.cursor()
+
+	execute_query(cur, 'BEGIN')
+
+	execute_query(cur, 'DROP TABLE IF EXISTS test_table')
+
+	execute_query(cur, 'CREATE TABLE test_table (%s) with (fillfactor=%d)' % (', '.join([v[0] for v in table_cols]), fillfactor))
+	print('CREATE TABLE test_table (%s) with (fillfactor=%d)' % (', '.join([v[0] for v in table_cols]), fillfactor))
+
+	insert_sql = 'INSERT INTO test_table SELECT %s FROM generate_series(1,%d) s(i)' % (', '.join([v[1] for v in table_cols]), nrows)
+	insert_sql = insert_sql % {'randomness' : int(nrows * randomness), 'rows' : nrows}
+
+	print(insert_sql)
+
+	execute_query(cur, insert_sql)
+
+	execute_query(cur, 'COMMIT')
+
+	cur.close()
+
+
+def create_indexes(conn, pages_per_range):
+
+	cur = conn.cursor()
+
+	num_indexes = random.randint(1,len(cols))
+
+	# randomly pick columns to index
+	indexed = random.sample(cols, num_indexes)
+
+	for c in indexed:
+		# f = random.random()
+		# num_pages = 1 + int(f * f * f * 256)
+		index_sql = 'CREATE INDEX ON test_table USING brin (%s %s) WITH (pages_per_range=%d)' % (c[0], c[1], pages_per_range)
+		print(index_sql)
+		execute_query(cur, index_sql)
+
+	cur.close()
+
+	return indexed
+
+
+def brinsort_in_explain(cur, query):
+
+	cur.execute('explain ' + query)
+	for r in cur.fetchall():
+		if 'BRIN Sort' in r['QUERY PLAN']:
+			return True
+
+	return False
+
+
+def compare_default(a, b):
+	if a < b:
+		return -1
+	elif a > b:
+		return 1
+	return 0
+
+
+def compare_inet(a, b):
+	a = [int(v) for v in a.split('.')]
+	b = [int(v) for v in b.split('.')]
+
+	for p in range(0,4):
+		r = compare_default(a[p], b[p])
+		if r != 0:
+			return r
+
+	return r
+
+
+def check_ordering(conn, config, query, expected_rows, select_star, select_list, sort_list, is_desc):
+
+	cur = conn.cursor()
+
+	data = execute_query(cur, query, True)
+
+	if len(data) != expected_rows:
+		print('ERROR: unexpected number of rows %s %s' % (expected_rows, len(data)))
+		sys.exit(1)
+
+	# what prefix we can check ordering for (some sort columns may not be
+	# included in the result, and we need a continuous prefix)
+	prefix = []
+	indexes = []
+	sort_order = {}
+	for s in sort_list:
+		if select_star:
+			if s[0] not in [x for x in table_cols]:
+				break
+
+			idx = [x for x in table_cols].index(s[0])
+		else:
+			if s not in select_list:
+				break
+
+			idx = select_list.index(s)
+
+		if idx is None:
+			break
+
+		sort_idx = sort_list.index(s)
+
+		prefix.append(s)
+		indexes.append(idx)
+		# sort_order.update({select_list.index(s) : is_desc[sort_list.index(s)]})
+		sort_order.update({idx : is_desc[sort_idx]})
+
+	# print("PREFIX", indexes, prefix)
+
+	if len(prefix) != 0:
+		prev = None
+		for row in data:
+			if prev is not None:
+
+				for idx in indexes:
+
+					if select_list[idx][1] == 'inet_minmax_ops':
+						r = compare_inet(prev[idx], row[idx])
+					else:
+						r = compare_default(prev[idx], row[idx])
+
+					if sort_order[idx]:
+						r = -r
+
+					if r > 0:
+						print("ERROR: incorrect ordering %s > %s" % (prev[idx], row[idx]))
+						sys.exit(1)
+
+					if r < 0:
+						break
+
+			prev = row
+
+	cur.close()
+
+
+def run_queries(conn, config, indexed_cols, num_queries = 1000):
+
+	nquery = 0
+
+	while nquery < num_queries:
+		if run_query(conn, config, indexed_cols):
+			nquery += 1
+
+
+def query_timing(cur, query):
+
+	runs = []
+
+	# get explain plan and costs from the first node
+	r = execute_query(cur, 'explain (analyze, timing off) %s' % (query,), True)
+	print("")
+	print("\n".join(['    ' + x[0] for x in r]))
+	print("")
+
+	sys.stdout.flush()
+
+	r = re.search('cost=([^\s]*)\.\.([^\s]*)', r[0][0])
+	costs = [float(r.groups()[0]), float(r.groups()[1])]
+
+	for r in range(0,1):
+		s = time.time()
+		execute_query(cur, query)
+		d = time.time()
+		runs.append(d-s)
+
+	# print("runs %s => mean %s" % (str(runs), mean(runs)))
+
+	sys.stdout.flush()
+
+	return (mean(runs), costs)
+
+
+def check_timing(conn, config, query):
+
+	cur = conn.cursor()
+
+	# get timing for a simple plan without a BRIN sort
+
+	execute_query(cur, 'set enable_seqscan = on')
+	execute_query(cur, 'set enable_brinsort = off')
+
+	(seqscan_time, seqscan_costs) = query_timing(cur, query)
+
+	# get timing for a simple plan with a BRIN sort
+	execute_query(cur, 'set enable_seqscan = off')
+	execute_query(cur, 'set enable_brinsort = on')
+	execute_query(cur, query)
+
+	(brinsort_time, brinsort_costs) = query_timing(cur, query)
+
+	print ("timing", 'rows', config['nrows'], 'pages_per_range', config['pages_per_range'], 'randomness', config['randomness'], 'fillfactor', config['fillfactor'], 'work_mem', config['work_mem'], 'watermark_step', config['watermark_step'], 'limit', config['limit'], 'offset', config['offset'], "seqscan", seqscan_time, "brinsort", brinsort_time, "costs seqscan", seqscan_costs[0], seqscan_costs[1], "brinsort", brinsort_costs[0], brinsort_costs[1])
+	# print ("brinsort timing", brinsort_time, "costs", brinsort_costs[0], brinsort_costs[1])
+
+	if (seqscan_costs[1] * 1.1 < brinsort_costs[1]) and (seqscan_time > brinsort_time * 1.1):
+		print ("COSTING ISSUE (%f < %f) && (%f > %f)" % (seqscan_costs[1], brinsort_costs[1], seqscan_time, brinsort_time))
+
+	if (seqscan_costs[1] > brinsort_costs[1] * 1.1) and (seqscan_time * 1.1 < brinsort_time):
+		print ("COSTING ISSUE (%f > %f) && (%f < %f)" % (seqscan_costs[1], brinsort_costs[1], seqscan_time, brinsort_time))
+
+	sys.stdout.flush()
+
+
+def run_query(conn, config, indexed_cols):
+
+	limit_rows = config['nrows']
+	offset_rows = 0
+
+	cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
+
+	# random columns to reference in the SELECT list, may not include sort column(s)
+	select_list = random.sample(cols, random.randint(1,len(cols)))
+
+	# but maybe just do select *, so that we don't do a projection
+	select_star = False
+	if random.random() < 0.5:
+		select_star = True
+		select_list = [('*', None)]
+
+	# random columns to reference in the ORDER BY clause
+	sort_list = random.sample(cols, random.randint(1,len(cols)))
+
+	# generate random ASC / DESC modifiers
+	is_desc = []
+	order_by = []
+	for s in range(0,len(sort_list)):
+		desc = random.choice([True, False])
+		is_desc.append(desc)
+		x = sort_list[s][0]
+		if desc:
+			x = x + ' DESC'
+		order_by.append(x)
+
+	query = 'SELECT %s FROM test_table ORDER BY %s' % (', '.join([v[0] for v in select_list]), ', '.join(order_by))
+
+	# randomly add LIMIT and OFFSET clause(s)
+	if random.random() < 0.5:
+
+		limit_rows = 1 + int(pow(random.random(), 3) * random.randint(1,config['nrows']))
+		query = query + ' LIMIT %d' % (limit_rows,);
+
+		if limit_rows < config['nrows'] and random.random() < 0.5:
+
+			offset_rows = int(pow(random.random(), 3) * random.randint(1,config['nrows'] - limit_rows))
+			query = query + ' OFFSET %d' % (offset_rows,);
+
+	expected_rows = min(limit_rows, config['nrows'] - offset_rows)
+
+	# watermark_step = random.randint(-1, 3)
+	watermark_step = random.choice([-1, 0, 1, 8, 32, 128])
+	execute_query(cur, 'SET brinsort_watermark_step = %d' % (watermark_step,))
+
+	f = random.random()
+	#work_mem_kb = 64 + int((f * f * f) * random.randint(64, 32768))
+	work_mem_kb = random.choice([64, 1024, 4096, 32768])
+
+	execute_query(cur, "SET work_mem = '%dkB'" % (work_mem_kb,))
+
+	config = config.copy()
+	config.update({'work_mem' : work_mem_kb})
+	config.update({'watermark_step' : watermark_step})
+	config.update({'limit' : limit_rows})
+	config.update({'offset' : offset_rows})
+
+	# do we expect brinsort or not? only when the first ORDER BY is indexed
+	if sort_list[0] in indexed_cols:
+
+		print('--------------', datetime.now(), '--------------')
+		print("SQL:", query)
+		print("CONFIG:", config)
+
+		if brinsort_in_explain(cur, query):
+			check_ordering(conn, config, query, expected_rows, select_star, select_list, sort_list, is_desc)
+			check_timing(conn, config, query)
+		else:
+			print("ERROR: BRIN Sort not in plan")
+			sys.exit(1)
+
+		result = True
+
+	else:
+
+		if brinsort_in_explain(cur, query):
+			print("ERROR: BRIN Sort in plan")
+			sys.exit(1)
+
+		result = False
+
+	cur.close()
+
+	return result
+
+
+def setup_connection(conn):
+	cur = conn.cursor()
+
+	# force index access
+	execute_query(cur, 'SET enable_seqscan = off')
+	execute_query(cur, 'SET max_parallel_workers_per_gather = 0')
+
+	cur.close()
+
+
+run_id = 0
+
+while True:
+
+	run_id += 1
+
+	config = {}
+
+	conn = psycopg2.connect('host=localhost port=5432 dbname=test user=user')
+
+	setup_connection(conn)
+
+	print('========== run %d ==========' % (run_id,))
+
+	# data distribution (1 - sequential, 3 - random)
+	config['randomness'] = random.choice([0, 0.05, 0.1, 0.25, 0.5, 1.0])
+
+	# random fillfactor, skewed closer to 10%
+	config['fillfactor'] = 10 + int(pow(random.random(),3) * 90)
+
+	# random number of rows
+	config['nrows'] = random.choice([100000, 1000000])
+
+	# pages per BRIN range (for all indexes)
+	config['pages_per_range'] = random.choice([1, 32, 128])
+
+	recreate_table(conn, config['nrows'], config['randomness'], config['fillfactor'])
+
+	indexed_cols = create_indexes(conn, config['pages_per_range'])
+
+	run_queries(conn, config, indexed_cols)
+
+	conn.close()
-- 
2.39.2

#43

Matthias van de Meent

boekewurm+postgres@gmail.com

almost 3 years ago

In reply to: Tomas Vondra (#41)

Re: PATCH: Using BRIN indexes for sorted output

On Fri, 24 Feb 2023, 20:14 Tomas Vondra, <tomas.vondra@enterprisedb.com> wrote:

On 2/24/23 19:03, Matthias van de Meent wrote:

On Thu, 23 Feb 2023 at 19:48, Tomas Vondra

Yeah, that sounds like a bug. Also a sign the tests should have some
by-ref data types (presumably there are none, as that would certainly
trip some asserts etc.).

I'm not sure we currently trip asserts, as the data we store in the
memory context is very limited, making it unlikely we actually release
the memory region back to the OS.
I did get assertion failures by adding the attached patch, but I don't
think that's something we can do in the final release.

But we should randomize the memory if we ever do pfree(), and it's
strange valgrind didn't complain when I ran tests with it.

Well, we don't clean up the decoding tuple immediately after our last
iteration, so the memory context (and the last tuple's attributes) are
still valid memory addresses. And, assuming that min/max values for
all brin ranges all have the same max-aligned length, the attribute
pointers are likely to point to the same offset within the decoding
tuple's memory context's memory segment, which would mean the dangling
pointers still point to valid memory - just not memory with the
contents they expected to be pointing to.

Considering how tiny BRIN indexes are, this is likely orders of
magnitude less I/O than we expend on sampling rows from the table. I
mean, with the default statistics target we read ~30000 pages (~240MB)
or more just to sample the rows. Randomly, while the BRIN index is
likely scanned mostly sequentially.

Mostly agreed; except I think it's not too difficult to imagine a BRIN
index that is on that scale; with as an example the bloom filters that
easily take up several hundreds of bytes.

With the default configuration of 128 pages_per_range,
n_distinct_per_range of -0.1, and false_positive_rate of 0.01, the
bloom filter size is 4.36 KiB - each indexed item on its own page. It
is still only 1% of the original table's size, but there are enough
tables that are larger than 24GB that this could be a significant
issue.

Right, it's certainly true BRIN indexes may be made fairly large (either
by using something like bloom or by making the ranges much smaller). But
those are exactly the indexes where building statistics for all columns
at once is going to cause issues with memory usage etc.

Note: Obviously, that depends on how much data per range we need to keep
in memory. For bloom I doubt we'd actually want to keep all the filters,
we'd probably calculate just some statistics (e.g. number of bits set),
so maybe the memory consumption is not that bad.

Yes, I was thinking something along the same lines for bloom as well.
Something like 'average number of bits set' (or: histogram number of
bits set), and/or for each bit a count (or %) how many times it is
set, etc.

Maybe there are cases where this would be an issue, but I haven't seen
one when working on this patch (and I did a lot of experiments). I'd
like to see one before we start optimizing it ...

I'm not only worried about optimizing it, I'm also worried that we're
putting this abstraction at the wrong level in a way that is difficult
to modify.

Yeah, that's certainly a valid concern. I admit I only did the minimum
amount of work on this part, as I was focused on the sorting part.

This also reminds me that the issues I actually saw (e.g. memory
consumption) would be made worse by processing all columns at once,
because then you need to keep more columns in memory.

Yes, I that can be a valid concern, but don't we already do similar
things in the current table statistics gathering?

Not really, I think. We sample a bunch of rows once, but then we build
statistics on this sample for each attribute / expression independently.
We could of course read the whole index into memory and then run the
analysis, but I think tuples tend to be much smaller (thanks to TOAST
etc.) and we only really scan a limited amount of them (30k).

Just to note, with default settings, sampling 30k index entries from
BRIN would cover ~29 GiB of a table. This isn't a lot, but it also
isn't exactly a small table. I think that it would be difficult to get
accurate avg_overlap statistics for some shapes of BRIN data...

But if we're concerned about the I/O, the BRIN is likely fairly large,
so maybe reading it into memory at once is not a great idea.

Agreed, we can't always expect that the interesting parts of the BRIN
index always fit in the available memory.

Kind regards,

Matthias van de Meent

#44

Matthias van de Meent

boekewurm+postgres@gmail.com

almost 3 years ago

In reply to: Matthias van de Meent (#33)

Re: PATCH: Using BRIN indexes for sorted output

Hi,

On Thu, 23 Feb 2023 at 17:44, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

I'll see to further reviewing 0004 and 0005 when I have additional time.

Some initial comments on 0004:

+/*
+ * brin_minmax_ranges
+ *        Load the BRIN ranges and sort them.
+ */
+Datum
+brin_minmax_ranges(PG_FUNCTION_ARGS)
+{

Like in 0001, this seems to focus on only single columns. Can't we put
the scan and sort infrastructure in brin, and put the weight- and
compare-operators in the opclasses? I.e.
brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min and
brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max, and a
brin_minmax_compare(order, order) -> int? I'm thinking of something
similar to GIST's distance operators, which would make implementing
ordering by e.g. `pointcolumn <-> (1, 2)::point` implementable in the
brin infrastructure.

Note: One big reason I don't really like the current
brin_minmax_ranges (and the analyze code in 0001) is because it breaks
the operatorclass-vs-index abstraction layer. Operator classes don't
(shouldn't) know or care about which attribute number they have, nor
what the index does with the data.
Scanning the index is not something that I expect the operator class
to do, I expect that the index code organizes the scan, and forwards
the data to the relevant operator classes.
Scanning the index N times for N attributes can be excused if there
are good reasons, but I'd rather have that decision made in the
index's core code rather than at the design level.

+/*
+ * XXX Does it make sense (is it possible) to have a sort by more than one
+ * column, using a BRIN index?
+ */

Yes, even if only one prefix column is included in the BRIN index
(e.g. `company` in `ORDER BY company, date`, the tuplesort with table
tuples can add additional sorting without first reading the whole
table, potentially (likely) reducing the total resource usage of the
query. That utilizes the same idea as incremental sorts, but with the
caveat that the input sort order is approximately likely but not at
all guaranteed. So, even if the range sort is on a single index
column, we can still do the table's tuplesort on all ORDER BY
attributes, as long as a prefix of ORDER BY columns are included in
the BRIN index.

+            /*
+             * XXX We can be a bit smarter for LIMIT queries - once we
+             * know we have more rows in the tuplesort than we need to
+             * output, we can stop spilling - those rows are not going
+             * to be needed. We can discard the tuplesort (no need to
+             * respill) and stop spilling.
+             */

Shouldn't that be "discard the tuplestore"?

+#define BRIN_PROCNUM_RANGES 12 /* optional */

It would be useful to add documentation for this in this patch.

Kind regards,

Matthias van de Meent.

#45

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 3 years ago

In reply to: Tomas Vondra (#38)

Re: PATCH: Using BRIN indexes for sorted output

On 2023-Feb-24, Tomas Vondra wrote:

On 2/24/23 16:14, Alvaro Herrera wrote:

I think a formulation of this kind has the benefit that it works after
BlockNumber is enlarged to 64 bits, and doesn't have to be changed ever
again (assuming it is correct).

Did anyone even propose doing that? I suspect this is unlikely to be the
only place that'd might be broken by that.

True about other places also needing fixes, and no I haven't see anyone;
but while 32 TB does seem very far away to us now, it might be not
*that* far away. So I think doing it the other way is better.

... if pagesPerRange is not a whole divisor of MaxBlockNumber, I think
this will neglect the last range in the table.

Why would it? Let's say BlockNumber is uint8, i.e. 255 max. And there
are 10 pages per range. That's 25 "full" ranges, and the last range
being just 5 pages. So we get into

prevHeapBlk = 240
heapBlk = 250

and we read the last 5 pages. And then we update

prevHeapBlk = 250
heapBlk = (250 + 10) % 255 = 5

and we don't do that loop. Or did I get this wrong, somehow?

I stand corrected.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#46

Tomas Vondra

tomas.vondra@enterprisedb.com

over 2 years ago

In reply to: Matthias van de Meent (#44)

Re: PATCH: Using BRIN indexes for sorted output

Hi,

I finally had time to look at this patch again. There's a bit of bitrot,
so here's a rebased version (no other changes).

[more comments inline]

On 2/27/23 16:40, Matthias van de Meent wrote:

Hi,

On Thu, 23 Feb 2023 at 17:44, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

I'll see to further reviewing 0004 and 0005 when I have additional time.

Some initial comments on 0004:
+/*
+ * brin_minmax_ranges
+ *        Load the BRIN ranges and sort them.
+ */
+Datum
+brin_minmax_ranges(PG_FUNCTION_ARGS)
+{
Like in 0001, this seems to focus on only single columns. Can't we put
the scan and sort infrastructure in brin, and put the weight- and
compare-operators in the opclasses? I.e.
brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min and
brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max, and a
brin_minmax_compare(order, order) -> int? I'm thinking of something
similar to GIST's distance operators, which would make implementing
ordering by e.g. `pointcolumn <-> (1, 2)::point` implementable in the
brin infrastructure.

Note: One big reason I don't really like the current
brin_minmax_ranges (and the analyze code in 0001) is because it breaks
the operatorclass-vs-index abstraction layer. Operator classes don't
(shouldn't) know or care about which attribute number they have, nor
what the index does with the data.
Scanning the index is not something that I expect the operator class
to do, I expect that the index code organizes the scan, and forwards
the data to the relevant operator classes.
Scanning the index N times for N attributes can be excused if there
are good reasons, but I'd rather have that decision made in the
index's core code rather than at the design level.

I think you're right. It'd be more appropriate to have most of the core
scanning logic in brin.c, and then delegate only some minor decisions to
the opclass. Like, comparisons, extraction of min/max from ranges etc.

However, it's not quite clear to me is what you mean by the weight- and
compare-operators? That is, what are

- brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min
- brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max
- brin_minmax_compare(order, order) -> int

supposed to do? Or what does "PG_FUNCTION_ARGS=brintuple" mean?

In principle we just need a procedure that tells us min/max for a given
page range - I guess that's what the minorder/maxorder functions do? But
why would we need the compare one? We're comparing by the known data
type, so we can just delegate the comparison to that, no?

Also, the existence of these opclass procedures should be enough to
identify the opclasses which can support this.

+/*
+ * XXX Does it make sense (is it possible) to have a sort by more than one
+ * column, using a BRIN index?
+ */
Yes, even if only one prefix column is included in the BRIN index
(e.g. `company` in `ORDER BY company, date`, the tuplesort with table
tuples can add additional sorting without first reading the whole
table, potentially (likely) reducing the total resource usage of the
query. That utilizes the same idea as incremental sorts, but with the
caveat that the input sort order is approximately likely but not at
all guaranteed. So, even if the range sort is on a single index
column, we can still do the table's tuplesort on all ORDER BY
attributes, as long as a prefix of ORDER BY columns are included in
the BRIN index.

That's now quite what I meant, though. When I mentioned sorting by more
than one column, I meant using a multi-column BRIN index on those
columns. Something like this:

CREATE TABLE t (a int, b int);
INSERT INTO t ...
CREATE INDEX ON t USING brin (a,b);

SELECT * FROM t ORDER BY a, b;

Now, what I think you described is using BRIN to sort by "a", and then
do incremental sort for "b". What I had in mind is whether it's possible
to use BRIN to sort by "b" too.

I was suspecting it might be made to work, but now that I think about it
again it probably can't - BRIN pretty much sorts the columns separately,
it's not like having an ordering by (a,b) - first by "a", then "b".

+            /*
+             * XXX We can be a bit smarter for LIMIT queries - once we
+             * know we have more rows in the tuplesort than we need to
+             * output, we can stop spilling - those rows are not going
+             * to be needed. We can discard the tuplesort (no need to
+             * respill) and stop spilling.
+             */

Shouldn't that be "discard the tuplestore"?

Yeah, definitely.

+#define BRIN_PROCNUM_RANGES 12 /* optional */

It would be useful to add documentation for this in this patch.

Right, this should be documented in doc/src/sgml/brin.sgml.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#47

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Tomas Vondra (#46)

Re: PATCH: Using BRIN indexes for sorted output

On Fri, 7 Jul 2023 at 20:26, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

Hi,

I finally had time to look at this patch again. There's a bit of bitrot,
so here's a rebased version (no other changes).

Thanks!

On 2/27/23 16:40, Matthias van de Meent wrote:
Some initial comments on 0004:
+/*
+ * brin_minmax_ranges
+ *        Load the BRIN ranges and sort them.
+ */
+Datum
+brin_minmax_ranges(PG_FUNCTION_ARGS)
+{
Like in 0001, this seems to focus on only single columns. Can't we put
the scan and sort infrastructure in brin, and put the weight- and
compare-operators in the opclasses? I.e.
brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min and
brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max, and a
brin_minmax_compare(order, order) -> int? I'm thinking of something
similar to GIST's distance operators, which would make implementing
ordering by e.g. `pointcolumn <-> (1, 2)::point` implementable in the
brin infrastructure.
However, it's not quite clear to me is what you mean by the weight- and
compare-operators? That is, what are

- brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min
- brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max
- brin_minmax_compare(order, order) -> int

supposed to do? Or what does "PG_FUNCTION_ARGS=brintuple" mean?

_minorder/_maxorder is for extracting the minimum/maximum relative
order of each range, used for ASC/DESC sorting of operator results
(e.g. to support ORDER BY <->(box_column, '(1,2)'::point) DESC).
PG_FUNCTION_ARGS is mentioned because of PG calling conventions;
though I did forget to describe the second operator argument for the
distance function. We might also want to use only one such "order
extraction function" with DESC/ASC indicated by an argument.

In principle we just need a procedure that tells us min/max for a given
page range - I guess that's what the minorder/maxorder functions do? But
why would we need the compare one? We're comparing by the known data
type, so we can just delegate the comparison to that, no?

Is there a comparison function for any custom orderable type that we
can just use? GIST distance ordering uses floats, and I don't quite
like that from a user perspective, as it makes ordering operations
imprecise. I'd rather allow (but discourage) any type with its own
compare function.

Also, the existence of these opclass procedures should be enough to
identify the opclasses which can support this.

Agreed.

+/*
+ * XXX Does it make sense (is it possible) to have a sort by more than one
+ * column, using a BRIN index?
+ */
Yes, even if only one prefix column is included in the BRIN index
(e.g. `company` in `ORDER BY company, date`, the tuplesort with table
tuples can add additional sorting without first reading the whole
table, potentially (likely) reducing the total resource usage of the
query. That utilizes the same idea as incremental sorts, but with the
caveat that the input sort order is approximately likely but not at
all guaranteed. So, even if the range sort is on a single index
column, we can still do the table's tuplesort on all ORDER BY
attributes, as long as a prefix of ORDER BY columns are included in
the BRIN index.
That's now quite what I meant, though. When I mentioned sorting by more
than one column, I meant using a multi-column BRIN index on those
columns. Something like this:

CREATE TABLE t (a int, b int);
INSERT INTO t ...
CREATE INDEX ON t USING brin (a,b);

SELECT * FROM t ORDER BY a, b;

Now, what I think you described is using BRIN to sort by "a", and then
do incremental sort for "b". What I had in mind is whether it's possible
to use BRIN to sort by "b" too.

I was suspecting it might be made to work, but now that I think about it
again it probably can't - BRIN pretty much sorts the columns separately,
it's not like having an ordering by (a,b) - first by "a", then "b".

I imagine it would indeed be limited to an extremely small subset of
cases, and probably not worth the effort to implement in an initial
version.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#48

Tomas Vondra

tomas.vondra@enterprisedb.com

over 2 years ago

In reply to: Matthias van de Meent (#47)

Re: PATCH: Using BRIN indexes for sorted output

On 7/10/23 12:22, Matthias van de Meent wrote:

On Fri, 7 Jul 2023 at 20:26, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

Hi,

I finally had time to look at this patch again. There's a bit of bitrot,
so here's a rebased version (no other changes).

Thanks!
On 2/27/23 16:40, Matthias van de Meent wrote:
Some initial comments on 0004:
+/*
+ * brin_minmax_ranges
+ *        Load the BRIN ranges and sort them.
+ */
+Datum
+brin_minmax_ranges(PG_FUNCTION_ARGS)
+{
Like in 0001, this seems to focus on only single columns. Can't we put
the scan and sort infrastructure in brin, and put the weight- and
compare-operators in the opclasses? I.e.
brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min and
brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max, and a
brin_minmax_compare(order, order) -> int? I'm thinking of something
similar to GIST's distance operators, which would make implementing
ordering by e.g. `pointcolumn <-> (1, 2)::point` implementable in the
brin infrastructure.
However, it's not quite clear to me is what you mean by the weight- and
compare-operators? That is, what are

- brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min
- brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max
- brin_minmax_compare(order, order) -> int

supposed to do? Or what does "PG_FUNCTION_ARGS=brintuple" mean?
_minorder/_maxorder is for extracting the minimum/maximum relative
order of each range, used for ASC/DESC sorting of operator results
(e.g. to support ORDER BY <->(box_column, '(1,2)'::point) DESC).
PG_FUNCTION_ARGS is mentioned because of PG calling conventions;
though I did forget to describe the second operator argument for the
distance function. We might also want to use only one such "order
extraction function" with DESC/ASC indicated by an argument.

I'm still not sure I understand what "minimum/maximum relative order"
is. Isn't it the same as returning min/max value that can appear in the
range? Although, that wouldn't work for points/boxes.

In principle we just need a procedure that tells us min/max for a given
page range - I guess that's what the minorder/maxorder functions do? But
why would we need the compare one? We're comparing by the known data
type, so we can just delegate the comparison to that, no?

Is there a comparison function for any custom orderable type that we
can just use? GIST distance ordering uses floats, and I don't quite
like that from a user perspective, as it makes ordering operations
imprecise. I'd rather allow (but discourage) any type with its own
compare function.

I haven't really thought about geometric types, just about minmax and
minmax-multi. It's not clear to me what the benefit for these types be.
I mean, we can probably sort points lexicographically, but is anyone
doing that in queries? It seems useless for order by distance.

Also, the existence of these opclass procedures should be enough to
identify the opclasses which can support this.

Agreed.
+/*
+ * XXX Does it make sense (is it possible) to have a sort by more than one
+ * column, using a BRIN index?
+ */
Yes, even if only one prefix column is included in the BRIN index
(e.g. `company` in `ORDER BY company, date`, the tuplesort with table
tuples can add additional sorting without first reading the whole
table, potentially (likely) reducing the total resource usage of the
query. That utilizes the same idea as incremental sorts, but with the
caveat that the input sort order is approximately likely but not at
all guaranteed. So, even if the range sort is on a single index
column, we can still do the table's tuplesort on all ORDER BY
attributes, as long as a prefix of ORDER BY columns are included in
the BRIN index.
That's now quite what I meant, though. When I mentioned sorting by more
than one column, I meant using a multi-column BRIN index on those
columns. Something like this:

CREATE TABLE t (a int, b int);
INSERT INTO t ...
CREATE INDEX ON t USING brin (a,b);

SELECT * FROM t ORDER BY a, b;

Now, what I think you described is using BRIN to sort by "a", and then
do incremental sort for "b". What I had in mind is whether it's possible
to use BRIN to sort by "b" too.

I was suspecting it might be made to work, but now that I think about it
again it probably can't - BRIN pretty much sorts the columns separately,
it's not like having an ordering by (a,b) - first by "a", then "b".
I imagine it would indeed be limited to an extremely small subset of
cases, and probably not worth the effort to implement in an initial
version.

OK, let's stick to order by a single column.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#49

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Tomas Vondra (#48)

Re: PATCH: Using BRIN indexes for sorted output

On Mon, 10 Jul 2023 at 13:43, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/10/23 12:22, Matthias van de Meent wrote:

On Fri, 7 Jul 2023 at 20:26, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

However, it's not quite clear to me is what you mean by the weight- and
compare-operators? That is, what are

- brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min
- brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max
- brin_minmax_compare(order, order) -> int

supposed to do? Or what does "PG_FUNCTION_ARGS=brintuple" mean?

_minorder/_maxorder is for extracting the minimum/maximum relative
order of each range, used for ASC/DESC sorting of operator results
(e.g. to support ORDER BY <->(box_column, '(1,2)'::point) DESC).
PG_FUNCTION_ARGS is mentioned because of PG calling conventions;
though I did forget to describe the second operator argument for the
distance function. We might also want to use only one such "order
extraction function" with DESC/ASC indicated by an argument.

I'm still not sure I understand what "minimum/maximum relative order"
is. Isn't it the same as returning min/max value that can appear in the
range? Although, that wouldn't work for points/boxes.

Kind of. For single-dimensional opclasses (minmax, minmax_multi) we
only need to extract the normal min/max values for ASC/DESC sorts,
which are readily available in the summary. But for multi-dimensional
and distance searches (nearest neighbour) we need to calculate the
distance between the indexed value(s) and the origin value to compare
the summary against, and the order would thus be asc/desc on distance
- a distance which may not be precisely represented by float types -
thus 'relative order' with its own order operation.

In principle we just need a procedure that tells us min/max for a given
page range - I guess that's what the minorder/maxorder functions do? But
why would we need the compare one? We're comparing by the known data
type, so we can just delegate the comparison to that, no?

Is there a comparison function for any custom orderable type that we
can just use? GIST distance ordering uses floats, and I don't quite
like that from a user perspective, as it makes ordering operations
imprecise. I'd rather allow (but discourage) any type with its own
compare function.

I haven't really thought about geometric types, just about minmax and
minmax-multi. It's not clear to me what the benefit for these types be.
I mean, we can probably sort points lexicographically, but is anyone
doing that in queries? It seems useless for order by distance.

Yes, that's why you would sort them by distance, where the distance is
generated by the opclass as min/max distance between the summary and
the distance's origin, and then inserted into the tuplesort.

(previously)

I finally had time to look at this patch again. There's a bit of bitrot,
so here's a rebased version (no other changes).

It seems like you forgot to attach the rebased patch, so unless you're
actively working on updating the patchset right now, could you send
the rebase to make CFBot happy?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)

#50

Tomas Vondra

tomas.vondra@enterprisedb.com

over 2 years ago

In reply to: Matthias van de Meent (#49)

11 attachment(s)

Re: PATCH: Using BRIN indexes for sorted output

On 7/10/23 14:38, Matthias van de Meent wrote:

On Mon, 10 Jul 2023 at 13:43, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/10/23 12:22, Matthias van de Meent wrote:

On Fri, 7 Jul 2023 at 20:26, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

However, it's not quite clear to me is what you mean by the weight- and
compare-operators? That is, what are

- brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min
- brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max
- brin_minmax_compare(order, order) -> int

supposed to do? Or what does "PG_FUNCTION_ARGS=brintuple" mean?

_minorder/_maxorder is for extracting the minimum/maximum relative
order of each range, used for ASC/DESC sorting of operator results
(e.g. to support ORDER BY <->(box_column, '(1,2)'::point) DESC).
PG_FUNCTION_ARGS is mentioned because of PG calling conventions;
though I did forget to describe the second operator argument for the
distance function. We might also want to use only one such "order
extraction function" with DESC/ASC indicated by an argument.

I'm still not sure I understand what "minimum/maximum relative order"
is. Isn't it the same as returning min/max value that can appear in the
range? Although, that wouldn't work for points/boxes.

Kind of. For single-dimensional opclasses (minmax, minmax_multi) we
only need to extract the normal min/max values for ASC/DESC sorts,
which are readily available in the summary. But for multi-dimensional
and distance searches (nearest neighbour) we need to calculate the
distance between the indexed value(s) and the origin value to compare
the summary against, and the order would thus be asc/desc on distance
- a distance which may not be precisely represented by float types -
thus 'relative order' with its own order operation.

Can you give some examples of such data / queries, and how would it
leverage the BRIN sort stuff?

For distance searches, I imagine this as data indexed by BRIN inclusion
opclass, which creates a bounding box. We could return closest/furthest
point on the bounding box (from the point used in the query). Which
seems a bit like a R-tree ...

But I have no idea what would this do for multi-dimensional searches, or
what would those searches do? How would you sort such data other than
lexicographically? Which I think is covered by the current BRIN Sort,
because the data is either stored as multiple columns, in which case we
use the BRIN on the first column. Or it's indexed using BRIN minmax as a
tuple of values, but then it's sorted lexicographically.

In principle we just need a procedure that tells us min/max for a given
page range - I guess that's what the minorder/maxorder functions do? But
why would we need the compare one? We're comparing by the known data
type, so we can just delegate the comparison to that, no?

Is there a comparison function for any custom orderable type that we
can just use? GIST distance ordering uses floats, and I don't quite
like that from a user perspective, as it makes ordering operations
imprecise. I'd rather allow (but discourage) any type with its own
compare function.

I haven't really thought about geometric types, just about minmax and
minmax-multi. It's not clear to me what the benefit for these types be.
I mean, we can probably sort points lexicographically, but is anyone
doing that in queries? It seems useless for order by distance.

Yes, that's why you would sort them by distance, where the distance is
generated by the opclass as min/max distance between the summary and
the distance's origin, and then inserted into the tuplesort.

OK, so the query says "order by distance from point X" and we calculate
the min/max distance of values in a given page range.

(previously)

I finally had time to look at this patch again. There's a bit of bitrot,
so here's a rebased version (no other changes).

It seems like you forgot to attach the rebased patch, so unless you're
actively working on updating the patchset right now, could you send
the rebase to make CFBot happy?

Yeah, I forgot about the attachment. So here it is.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Allow-index-AMs-to-build-and-use-custom-sta-20230710.patchtext/x-patch; charset=UTF-8; name=0001-Allow-index-AMs-to-build-and-use-custom-sta-20230710.patchDownload

From 0b99eec03885b95d5c748d0709db489f82d02856 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 17 Oct 2022 18:39:28 +0200
Subject: [PATCH 01/11] Allow index AMs to build and use custom statistics

Some indexing AMs work very differently and estimating them using
existing statistics is problematic, producing unreliable costing. This
applies e.g. to BRIN, which relies on page ranges, not tuple pointers.

This adds an optional AM procedure, allowing the opfamily to build
custom statistics, store them in pg_statistic and then use them during
planning. By default this is disabled, but may be enabled by setting

   SET enable_indexam_stats = true;

Then ANALYZE will call the optional procedure for all indexes.
---
 src/backend/access/brin/brin.c                |   1 +
 src/backend/access/brin/brin_minmax.c         | 902 ++++++++++++++++++
 src/backend/commands/analyze.c                | 149 ++-
 src/backend/statistics/extended_stats.c       |   2 +
 src/backend/utils/adt/selfuncs.c              |  59 ++
 src/backend/utils/cache/lsyscache.c           |  41 +
 src/backend/utils/misc/guc_tables.c           |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/access/amapi.h                    |   2 +
 src/include/access/brin.h                     |  63 ++
 src/include/access/brin_internal.h            |   1 +
 src/include/catalog/pg_amproc.dat             |  64 ++
 src/include/catalog/pg_proc.dat               |   4 +
 src/include/catalog/pg_statistic.h            |   5 +
 src/include/commands/vacuum.h                 |   2 +
 src/include/utils/lsyscache.h                 |   1 +
 src/test/regress/expected/sysviews.out        |   3 +-
 17 files changed, 1304 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 3c6a956eaa..5758afdd83 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -95,6 +95,7 @@ brinhandler(PG_FUNCTION_ARGS)
 	amroutine->amstrategies = 0;
 	amroutine->amsupport = BRIN_LAST_OPTIONAL_PROCNUM;
 	amroutine->amoptsprocnum = BRIN_PROCNUM_OPTIONS;
+	amroutine->amstatsprocnum = BRIN_PROCNUM_STATISTICS;
 	amroutine->amcanorder = false;
 	amroutine->amcanorderbyop = false;
 	amroutine->amcanbackward = false;
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 8229493c84..64150204fc 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -10,17 +10,23 @@
  */
 #include "postgres.h"
 
+#include "access/brin.h"
 #include "access/brin_internal.h"
+#include "access/brin_revmap.h"
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
+#include "executor/executor.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
 #include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 typedef struct MinmaxOpaque
 {
@@ -253,6 +259,902 @@ brin_minmax_union(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/* FIXME copy of a private struct from brin.c */
+typedef struct BrinOpaque
+{
+	BlockNumber bo_pagesPerRange;
+	BrinRevmap *bo_rmAccess;
+	BrinDesc   *bo_bdesc;
+} BrinOpaque;
+
+/*
+ * Compare ranges by minval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_minval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->min_value, rb->min_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/*
+ * Compare ranges by maxval (collation and operator are taken from the extra
+ * argument, which is expected to be TypeCacheEntry).
+ */
+static int
+range_maxval_cmp(const void *a, const void *b, void *arg)
+{
+	BrinRange *ra = *(BrinRange **) a;
+	BrinRange *rb = *(BrinRange **) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+	int		r;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  ra->max_value, rb->max_value);
+	r = DatumGetInt32(c);
+
+	if (r != 0)
+		return r;
+
+	if (ra->blkno_start < rb->blkno_start)
+		return -1;
+	else
+		return 1;
+}
+
+/* compare values using an operator from typcache */
+static int
+range_values_cmp(const void *a, const void *b, void *arg)
+{
+	Datum	da = * (Datum *) a;
+	Datum	db = * (Datum *) b;
+	TypeCacheEntry *typentry = (TypeCacheEntry *) arg;
+	FmgrInfo   *cmpfunc = &typentry->cmp_proc_finfo;
+	Datum	c;
+
+	c = FunctionCall2Coll(cmpfunc, typentry->typcollation,
+						  da, db);
+	return DatumGetInt32(c);
+}
+
+/*
+ * minval_end
+ *		Determine first index so that (minval > value).
+ *
+ * The array of ranges is expected to be sorted by minvalue, so this is the first
+ * range that can't possibly intersect with a range having "value" as maxval.
+ */
+static int
+minval_end(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[end]->min_value, typcache) >= 0)
+		return nranges;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[start]->min_value, typcache) < 0)
+		return 0;
+
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->min_value, typcache);
+
+		if (r >= 0)
+			start = midpoint + 1;
+		else
+			end = midpoint;
+	}
+
+	Assert(range_values_cmp(&ranges[start]->min_value, &value, typcache) > 0);
+	Assert(range_values_cmp(&ranges[start-1]->min_value, &value, typcache) <= 0);
+
+	return start;
+}
+
+
+/*
+ * lower_bound
+ *		Determine first index so that (values[index] >= value).
+ *
+ * The array of values is sorted, and this returns the first value that
+ * exceeds (or is equal) to the minvalue.
+ */
+static int
+lower_bound(Datum *values, int nvalues, Datum minvalue, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	/* all values exceed minvalue - return the first element */
+	if (range_values_cmp(&minvalue, &values[start], typcache) <= 0)
+		return 0;
+
+	/* nothing matches - return the element after the last one */
+	if (range_values_cmp(&minvalue, &values[end], typcache) > 0)
+		return nvalues;
+
+	/*
+	 * Now we know the lower boundary is somewhere in the array (and we know
+	 * it's not the first element, because that's covered by the first check
+	 * above). So do a binary search.
+	 */
+	while ((end - start) > 0)
+	{
+		int	midpoint;
+		int	r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&minvalue, &values[midpoint], typcache);
+
+		if (r <= 0)	/* minvalue >= midpoint */
+			end = midpoint;
+		else		/* midpoint < minvalue */
+			start = (midpoint + 1);
+	}
+
+	Assert(range_values_cmp(&minvalue, &values[start], typcache) <= 0);
+	Assert(range_values_cmp(&minvalue, &values[start-1], typcache) > 0);
+
+	return start;
+}
+
+/*
+ * upper_bound
+ *		Determine last index so that (values[index] <= maxvalue).
+ *
+ * The array of values is sorted, and this returns the last value that
+ * does not exceed (or is equal) to the maxvalue.
+ */
+static int
+upper_bound(Datum *values, int nvalues, Datum maxvalue, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nvalues - 1);
+
+	/* everything matches, return the last element */
+	if (range_values_cmp(&values[end], &maxvalue, typcache) <= 0)
+		return (nvalues - 1);
+
+	/* nothing matches, return the element before the first one */
+	if (range_values_cmp(&values[start], &maxvalue, typcache) > 0)
+		return -1;
+
+	/*
+	 * Now we know the lower boundary is somewhere in the array (and we know
+	 * it's not the last element, because that's covered by the first check
+	 * above). So do a binary search.
+	 */
+	while ((end - start) > 0)
+	{
+		int midpoint;
+		int r;
+
+		midpoint = start + (end - start) / 2;
+
+		/* Ensure we always move (it might be equal to start due to rounding). */
+		midpoint = Max(start+1, midpoint);
+
+		r = range_values_cmp(&values[midpoint], &maxvalue, typcache);
+
+		if (r <= 0)			/* value <= maxvalue */
+			start = midpoint;
+		else				/* value > maxvalue */
+			end = midpoint - 1;
+	}
+
+	Assert(range_values_cmp(&values[start], &maxvalue, typcache) <= 0);
+	Assert(range_values_cmp(&values[start+1], &maxvalue, typcache) > 0);
+
+	return start;
+}
+
+/*
+ * brin_minmax_count_overlaps
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges to quickly eliminate ranges that can't possibly
+ * intersect. We simply walk minranges until minval > current maxval, and
+ * we're done.
+ *
+ * Unlike brin_minmax_count_overlaps2, this does not have issues with wide
+ * ranges, so this is what we should use.
+ */
+static void
+brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
+						   TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	int64	noverlaps;
+
+	noverlaps = 0;
+	for (int i = 0; i < nranges; i++)
+	{
+		Datum	maxval = minranges[i]->max_value;
+
+		/*
+		 * Determine index of the first range with (minval > current maxval)
+		 * by binary search. We know all other ranges can't overlap the
+		 * current one. We simply subtract indexes to count ranges.
+		 */
+		int		idx = minval_end(minranges, nranges, maxval, typcache);
+
+		/* -1 because we don't count the range as intersecting with itself */
+		noverlaps += (idx - i - 1);
+	}
+
+	/*
+	 * We only count 1/2 the ranges (minval > current minval), so the total
+	 * number of overlaps is twice what we counted.
+	 */
+	noverlaps *= 2;
+
+	stats->avg_overlaps = (double) noverlaps / nranges;
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Alternative to brin_minmax_match_tuples_to_ranges2, leveraging ordering
+ * of values, not ranges.
+ *
+ * XXX This seems like the optimal way to do this.
+ */
+static void
+brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
+								   int numrows, HeapTuple *rows,
+								   int nvalues, Datum *values,
+								   TypeCacheEntry *typcache,
+								   BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nvalues_unique = 0;
+
+	int64  *unique = (int64 *) palloc0(sizeof(int64) * nvalues);
+
+	/*
+	 * Build running count of unique values. We know there are unique[i]
+	 * unique values in values array up to index "i".
+	 */
+	unique[0] = 1;
+	for (int i = 1; i < nvalues; i++)
+	{
+		if (range_values_cmp(&values[i-1], &values[i], typcache) == 0)
+			unique[i] = unique[i-1];
+		else
+			unique[i] = unique[i-1] + 1;
+	}
+
+	nvalues_unique = unique[nvalues-1];
+
+	/*
+	 * Walk the ranges, for each range determine the first/last mapping
+	 * value. Use the "unique" array to count the unique values.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int		start,
+				end,
+				nvalues_match,
+				nunique_match;
+
+		CHECK_FOR_INTERRUPTS();
+
+		start = lower_bound(values, nvalues, ranges->ranges[i].min_value, typcache);
+		end = upper_bound(values, nvalues, ranges->ranges[i].max_value, typcache);
+
+		/* if nothing matches (e.g. end=0), skip this range */
+		if (end <= start)
+			continue;
+
+		nvalues_match = (end - start + 1);
+		nunique_match = (unique[end] - unique[start] + 1);
+
+		Assert((nvalues_match >= 1) && (nvalues_match <= nvalues));
+		Assert((nunique_match >= 1) && (nunique_match <= unique[nvalues-1]));
+
+		nmatches += nvalues_match;
+		nmatches_unique += nunique_match;
+	}
+
+	Assert(nmatches >= 0);
+	Assert(nmatches_unique >= 0);
+
+	stats->avg_matches = (double) nmatches / numrows;
+	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
+}
+
+/*
+ * brin_minmax_value_stats
+ *		Calculate statistics about minval/maxval values.
+ *
+ * We calculate the number of distinct values, and also correlation with respect
+ * to blkno_start. We don't calculate the regular correlation coefficient, because
+ * our goal is to estimate how sequential the accesses are. The regular correlation
+ * would produce 0 for cyclical data sets like mod(i,1000000), but it may be quite
+ * sequantial access. Maybe it should be called differently, not correlation?
+ *
+ * XXX Maybe this should calculate minval vs. maxval correlation too?
+ *
+ * XXX I don't know how important the sequentiality is - BRIN generally uses 1MB
+ * page ranges, which is pretty sequential and the one random seek in between is
+ * likely going to be negligible. Maybe for small page ranges it'll matter, though.
+ */
+static void
+brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
+						int nranges, TypeCacheEntry *typcache,
+						BrinMinmaxStats *stats)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1,
+			minval_corr = 0,
+			maxval_corr = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+			minval_ndist++;
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+			maxval_ndist++;
+
+		/* is it immediately sequential? */
+		if (minranges[i-1]->blkno_end + 1 == minranges[i]->blkno_start)
+			minval_corr++;
+
+		/* is it immediately sequential? */
+		if (maxranges[i-1]->blkno_end + 1 == maxranges[i]->blkno_start)
+			maxval_corr++;
+	}
+
+	stats->minval_ndistinct = minval_ndist;
+	stats->maxval_ndistinct = maxval_ndist;
+
+	stats->minval_correlation = (double) minval_corr / nranges;
+	stats->maxval_correlation = (double) maxval_corr / nranges;
+}
+
+/*
+ * brin_minmax_increment_stats
+ *		Calculate the increment size for minval/maxval steps.
+ *
+ * Calculates the minval/maxval increment size, i.e. number of rows that need
+ * to be added to the sort. This serves as an input to calculation of a good
+ * watermark step.
+ */
+static void
+brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
+							int nranges, Datum *values, int nvalues,
+							TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	/* */
+	int64	minval_ndist = 1,
+			maxval_ndist = 1;
+
+	double	sum_minval = 0,
+			sum_maxval = 0,
+			max_minval = 0,
+			max_maxval = 0;
+
+	for (int i = 1; i < nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
+		{
+			double	p;
+			int		start = upper_bound(values, nvalues, minranges[i-1]->min_value, typcache);
+			int		end = upper_bound(values, nvalues, minranges[i]->min_value, typcache);
+
+			/*
+			 * Maybe there are no matching rows, but we still need to count
+			 * this as distinct minval (even though the sample increase is 0).
+			 */
+			minval_ndist++;
+
+			Assert(end >= start);
+
+			/* no sample rows match this, so skip */
+			if (end == start)
+				continue;
+
+			p = (double) (end - start) / nvalues;
+
+			max_minval = Max(max_minval, p);
+			sum_minval += p;
+		}
+
+		if (range_values_cmp(&maxranges[i-1]->max_value, &maxranges[i]->max_value, typcache) != 0)
+		{
+			double	p;
+			int		start = upper_bound(values, nvalues, maxranges[i-1]->max_value, typcache);
+			int		end = upper_bound(values, nvalues, maxranges[i]->max_value, typcache);
+
+			/*
+			 * Maybe there are no matching rows, but we still need to count
+			 * this as distinct maxval (even though the sample increase is 0).
+			 */
+			maxval_ndist++;
+
+			Assert(end >= start);
+
+			/* no sample rows match this, so skip */
+			if (end == start)
+				continue;
+
+			p = (double) (end - start) / nvalues;
+
+			max_maxval = Max(max_maxval, p);
+			sum_maxval += p;
+		}
+	}
+
+	stats->minval_increment_avg = (sum_minval / minval_ndist);
+	stats->minval_increment_max = max_minval;
+
+	stats->maxval_increment_avg = (sum_maxval / maxval_ndist);
+	stats->maxval_increment_max = max_maxval;
+}
+
+/*
+ * brin_minmax_stats
+ *		Calculate custom statistics for a BRIN minmax index.
+ *
+ * At the moment this calculates:
+ *
+ *  - number of summarized/not-summarized and all/has nulls ranges
+ *  - average number of overlaps for a range
+ *  - average number of rows matching a range
+ *  - number of distinct minval/maxval values
+ *
+ * XXX This could also calculate correlation of the range minval, so that
+ * we can estimate how much random I/O will happen during the BrinSort.
+ * And perhaps we should also sort the ranges by (minval,block_start) to
+ * make this as sequential as possible?
+ *
+ * XXX Another interesting statistics might be the number of ranges with
+ * the same minval (or number of distinct minval values), because that's
+ * essentially what we need to estimate how many ranges will be read in
+ * one brinsort step. In fact, knowing the number of distinct minval
+ * values tells us the number of BrinSort loops.
+ *
+ * XXX We might also calculate a histogram of minval/maxval values.
+ *
+ * XXX I wonder if we could track for each range track probabilities:
+ *
+ * - P1 = P(v <= minval)
+ * - P2 = P(x <= Max(maxval)) for Max(maxval) over preceding ranges
+ *
+ * That would allow us to estimate how many ranges we'll have to read to produce
+ * a particular number of rows, because we need the first probability to exceed
+ * the requested number of rows (fraction of the table):
+ *
+ *     (limit rows / reltuples) <= P(v <= minval)
+ *
+ * and then the second probability would say how many rows we'll process (either
+ * sort or spill). And inversely for the DESC ordering.
+ *
+ * The difference between P1 for two ranges is how much we'd have to sort
+ * if we moved the watermark between the ranges (first minval to second one).
+ * The (P2 - P1) for the new watermark range measures the number of rows in
+ * the tuplestore. We'll need to aggregate this, though, we can't keep the
+ * whole data - probably average/median/max for the differences would be nice.
+ * Might be tricky for different watermark step values, though.
+ *
+ * This would also allow estimating how many rows will spill from each range,
+ * because we have an estimate how many rows match a range on average, and
+ * we can compare it to the difference between P1.
+ *
+ * One issue is we don't have actual tuples from the ranges, so we can't
+ * measure exactly how many rows would we add. But we can match the sample
+ * and at least estimate the the probability difference.
+ *
+ * Actually - we do know the tuples *are* in those ranges, because if we
+ * assume the tuple is in some other range, that range would have to have
+ * a minimal/maximal value so that the value is consistent. Which means
+ * the range has to be between those ranges. Of course, this only estimates
+ * the rows we'd going to add to the tuplesort - there might be more rows
+ * we read and spill to tuplestore, but that's something we can estimate
+ * using average tuples per range.
+ */
+Datum
+brin_minmax_stats(PG_FUNCTION_ARGS)
+{
+	Relation		heapRel = (Relation) PG_GETARG_POINTER(0);
+	Relation		indexRel = (Relation) PG_GETARG_POINTER(1);
+	AttrNumber		attnum = PG_GETARG_INT16(2);	/* index attnum */
+	AttrNumber		heap_attnum = PG_GETARG_INT16(3);
+	Expr		   *expr = (Expr *) PG_GETARG_POINTER(4);
+	HeapTuple	   *rows = (HeapTuple *) PG_GETARG_POINTER(5);
+	int				numrows = PG_GETARG_INT32(6);
+
+	BrinOpaque *opaque;
+	BlockNumber nblocks;
+	BlockNumber	nranges;
+	BlockNumber	heapBlk;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BrinRanges  *ranges;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinMinmaxStats *stats;
+	Form_pg_attribute attr;
+
+	Oid				typoid;
+	TypeCacheEntry *typcache;
+	BrinRange	  **minranges,
+				  **maxranges;
+	int64			prev_min_index;
+
+	/* expression stats */
+	EState	   *estate;
+	ExprContext *econtext;
+	ExprState  *exprstate;
+	TupleTableSlot *slot;
+
+	/* attnum or expression has to be supplied */
+	Assert(AttributeNumberIsValid(heap_attnum) || (expr != NULL));
+
+	/* but not both of them at the same time */
+	Assert(!(AttributeNumberIsValid(heap_attnum) && (expr != NULL)));
+
+	/*
+	 * Mostly what brinbeginscan does to initialize BrinOpaque, except that
+	 * we use active snapshot instead of the scan snapshot.
+	 */
+	opaque = palloc_object(BrinOpaque);
+	opaque->bo_rmAccess = brinRevmapInitialize(indexRel,
+											   &opaque->bo_pagesPerRange,
+											   GetActiveSnapshot());
+	opaque->bo_bdesc = brin_build_desc(indexRel);
+
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/* attribute information */
+	attr = TupleDescAttr(bdesc->bd_tupdesc, attnum - 1);
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+
+	/*
+	 * How many ranges can there be? We simply look at the number of pages,
+	 * divide it by the pages_per_range.
+	 *
+	 * XXX We need to be careful not to overflow nranges, so we just divide
+	 * and then maybe add 1 for partial ranges.
+	 */
+	nranges = (nblocks / pagesPerRange);
+	if (nblocks % pagesPerRange != 0)
+		nranges += 1;
+
+	/* allocate for space, and also for the alternative ordering */
+	ranges = palloc0(offsetof(BrinRanges, ranges) + nranges * sizeof(BrinRange));
+	ranges->nranges = 0;
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* result stats */
+	stats = palloc0(sizeof(BrinMinmaxStats));
+	SET_VARSIZE(stats, sizeof(BrinMinmaxStats));
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX We count the ranges, and count the special types (not summarized,
+	 * all-null and has-null). The regular ranges are accumulated into an
+	 * array, so that we can calculate additional statistics (overlaps, hits
+	 * for sample tuples, etc).
+	 *
+	 * XXX This needs rethinking to make it work with large indexes with more
+	 * ranges than we can fit into memory (work_mem/maintenance_work_mem).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		stats->n_ranges++;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   GetActiveSnapshot());
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/* Ranges with no indexed tuple are ignored for overlap analysis. */
+		if (!gottuple)
+		{
+			continue;
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/* Placeholders can be ignored too, as if not summarized. */
+				continue;
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				/* OK this range is summarized */
+				stats->n_summarized++;
+
+				if (bval->bv_allnulls)
+					stats->n_all_nulls++;
+
+				if (bval->bv_hasnulls)
+					stats->n_has_nulls++;
+
+				if (!bval->bv_allnulls)
+				{
+					BrinRange  *range;
+
+					range = &ranges->ranges[ranges->nranges++];
+
+					range->blkno_start = heapBlk;
+					range->blkno_end = heapBlk + (pagesPerRange - 1);
+
+					range->min_value = datumCopy(bval->bv_values[0],
+												 attr->attbyval, attr->attlen);
+					range->max_value = datumCopy(bval->bv_values[1],
+												 attr->attbyval, attr->attlen);
+				}
+			}
+		}
+	}
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* if we have no regular ranges, we're done */
+	if (ranges->nranges == 0)
+		goto cleanup;
+
+	/*
+	 * Build auxiliary info to optimize the calculation.
+	 *
+	 * We have ranges in the blocknum order, but that is not very useful when
+	 * calculating which ranges interstect - we could cross-check every range
+	 * against every other range, but that's O(N^2) and thus may get extremely
+	 * expensive pretty quick).
+	 *
+	 * To make that cheaper, we'll build two orderings, allowing us to quickly
+	 * eliminate ranges that can't possibly overlap:
+	 *
+	 * - minranges = ranges ordered by min_value
+	 * - maxranges = ranges ordered by max_value
+	 *
+	 * To count intersections, we'll then walk maxranges (i.e. ranges ordered
+	 * by maxval), and for each following range we'll check if it overlaps.
+	 * If yes, we'll proceed to the next one, until we find a range that does
+	 * not overlap. But there might be a later page overlapping - but we can
+	 * use a min_index_lowest tracking the minimum min_index for "future"
+	 * ranges to quickly decide if there are such ranges. If there are none,
+	 * we can terminate (and proceed to the next maxranges element), else we
+	 * have to process additional ranges.
+	 *
+	 * Note: This only counts overlaps with ranges with max_value higher than
+	 * the current one - we want to count all, but the overlaps with preceding
+	 * ranges have already been counted when processing those preceding ranges.
+	 * That is, we'll end up with counting each overlap just for one of those
+	 * ranges, so we get only 1/2 the count.
+	 *
+	 * Note: We don't count the range as overlapping with itself. This needs
+	 * to be considered later, when applying the statistics.
+	 *
+	 *
+	 * XXX This will not work for very many ranges - we can have up to 2^32 of
+	 * them, so allocating a ~32B struct for each would need a lot of memory.
+	 * Not sure what to do about that, perhaps we could sample a couple ranges
+	 * and do some calculations based on that? That is, we could process all
+	 * ranges up to some number (say, statistics_target * 300, as for rows), and
+	 * then sample ranges for larger tables. Then sort the sampled ranges, and
+	 * walk through all ranges once, comparing them to the sample and counting
+	 * overlaps (having them sorted should allow making this quite efficient,
+	 * I think - following algorithm similar to the one implemented here).
+	 */
+
+	/* info about ordering for the data type */
+	typoid = get_atttype(RelationGetRelid(indexRel), attnum);
+	typcache = lookup_type_cache(typoid, TYPECACHE_CMP_PROC_FINFO);
+
+	/* shouldn't happen, I think - we use this to build the index */
+	Assert(OidIsValid(typcache->cmp_proc_finfo.fn_oid));
+
+	minranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+	maxranges = (BrinRange **) palloc0(ranges->nranges * sizeof(BrinRanges *));
+
+	/*
+	 * Build and sort the ranges min_value / max_value (just pointers
+	 * to the main array). Then go and assign the min_index to each
+	 * range, and finally walk the maxranges array backwards and track
+	 * the min_index_lowest as minimum of "future" indexes.
+	 */
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		minranges[i] = &ranges->ranges[i];
+		maxranges[i] = &ranges->ranges[i];
+	}
+
+	qsort_arg(minranges, ranges->nranges, sizeof(BrinRange *),
+			  range_minval_cmp, typcache);
+
+	qsort_arg(maxranges, ranges->nranges, sizeof(BrinRange *),
+			  range_maxval_cmp, typcache);
+
+	/*
+	 * Update the min_index for each range. If the values are equal, be sure to
+	 * pick the lowest index with that min_value.
+	 */
+	minranges[0]->min_index = 0;
+	for (int i = 1; i < ranges->nranges; i++)
+	{
+		if (range_values_cmp(&minranges[i]->min_value, &minranges[i-1]->min_value, typcache) == 0)
+			minranges[i]->min_index = minranges[i-1]->min_index;
+		else
+			minranges[i]->min_index = i;
+	}
+
+	/*
+	 * Walk the maxranges backward and assign the min_index_lowest as
+	 * a running minimum.
+	 */
+	prev_min_index = ranges->nranges;
+	for (int i = (ranges->nranges - 1); i >= 0; i--)
+	{
+		maxranges[i]->min_index_lowest = Min(maxranges[i]->min_index,
+											 prev_min_index);
+		prev_min_index = maxranges[i]->min_index_lowest;
+	}
+
+	/* calculate average number of overlapping ranges for any range */
+	brin_minmax_count_overlaps(minranges, ranges->nranges, typcache, stats);
+
+	/* calculate minval/maxval stats (distinct values and correlation) */
+	brin_minmax_value_stats(minranges, maxranges,
+							ranges->nranges, typcache, stats);
+
+	/*
+	 * If processing expression, prepare context to evaluate it.
+	 *
+	 * XXX cleanup / refactoring needed
+	 */
+	if (expr)
+	{
+		estate = CreateExecutorState();
+		econtext = GetPerTupleExprContext(estate);
+
+		/* Need a slot to hold the current heap tuple, too */
+		slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRel),
+										&TTSOpsHeapTuple);
+
+		/* Arrange for econtext's scan tuple to be the tuple under test */
+		econtext->ecxt_scantuple = slot;
+
+		exprstate = ExecPrepareExpr(expr, estate);
+	}
+
+	/* match tuples to ranges */
+	{
+		int		nvalues = 0;
+		Datum  *values = (Datum *) palloc0(numrows * sizeof(Datum));
+
+		TupleDesc	tdesc = RelationGetDescr(heapRel);
+
+		for (int i = 0; i < numrows; i++)
+		{
+			bool	isnull;
+			Datum	value;
+
+			if (!expr)
+				value = heap_getattr(rows[i], heap_attnum, tdesc, &isnull);
+			else
+			{
+				/*
+				 * Reset the per-tuple context each time, to reclaim any cruft
+				 * left behind by evaluating the predicate or index expressions.
+				 */
+				ResetExprContext(econtext);
+
+				/* Set up for predicate or expression evaluation */
+				ExecStoreHeapTuple(rows[i], slot, false);
+
+				value = ExecEvalExpr(exprstate,
+									 GetPerTupleExprContext(estate),
+									 &isnull);
+			}
+
+			if (!isnull)
+				values[nvalues++] = value;
+		}
+
+		qsort_arg(values, nvalues, sizeof(Datum), range_values_cmp, typcache);
+
+		/* optimized algorithm */
+		brin_minmax_match_tuples_to_ranges(ranges,
+										   numrows, rows, nvalues, values,
+										   typcache, stats);
+
+		brin_minmax_increment_stats(minranges, maxranges, ranges->nranges,
+									values, nvalues, typcache, stats);
+	}
+
+	/* XXX cleanup / refactoring needed */
+	if (expr)
+	{
+		ExecDropSingleTupleTableSlot(slot);
+		FreeExecutorState(estate);
+	}
+
+	/*
+	 * Possibly quite large, so release explicitly and don't rely
+	 * on the memory context to discard this.
+	 */
+	pfree(minranges);
+	pfree(maxranges);
+
+cleanup:
+	/* possibly quite large, so release explicitly */
+	pfree(ranges);
+
+	/* free the BrinOpaque, just like brinendscan() would */
+	brinRevmapTerminate(opaque->bo_rmAccess);
+	brin_free_desc(opaque->bo_bdesc);
+
+	PG_RETURN_POINTER(stats);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index bfd981aa3f..ffc72f2180 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -16,6 +16,7 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
 #include "access/detoast.h"
 #include "access/genam.h"
 #include "access/multixact.h"
@@ -30,6 +31,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_namespace.h"
@@ -81,6 +83,7 @@ typedef struct AnlIndexData
 
 /* Default statistics target (GUC parameter) */
 int			default_statistics_target = 100;
+bool		enable_indexam_stats = false;
 
 /* A few variables that don't seem worth passing around as parameters */
 static MemoryContext anl_context = NULL;
@@ -92,7 +95,7 @@ static void do_analyze_rel(Relation onerel,
 						   AcquireSampleRowsFunc acquirefunc, BlockNumber relpages,
 						   bool inh, bool in_outer_xact, int elevel);
 static void compute_index_stats(Relation onerel, double totalrows,
-								AnlIndexData *indexdata, int nindexes,
+								AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 								HeapTuple *rows, int numrows,
 								MemoryContext col_context);
 static VacAttrStats *examine_attribute(Relation onerel, int attnum,
@@ -454,15 +457,49 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		{
 			AnlIndexData *thisdata = &indexdata[ind];
 			IndexInfo  *indexInfo;
+			bool		collectAmStats;
+			Oid			regproc;
 
 			thisdata->indexInfo = indexInfo = BuildIndexInfo(Irel[ind]);
 			thisdata->tupleFract = 1.0; /* fix later if partial */
-			if (indexInfo->ii_Expressions != NIL && va_cols == NIL)
+
+			/*
+			 * Should we collect AM-specific statistics for any of the columns?
+			 *
+			 * If AM-specific statistics are enabled (using a GUC), see if we
+			 * have an optional support procedure to build the statistics.
+			 *
+			 * If there's any such attribute, we just force building stats
+			 * even for regular index keys (not just expressions) and indexes
+			 * without predicates. It'd be good to only build the AM stats, but
+			 * for now this is good enough.
+			 *
+			 * XXX The GUC is there morestly to make it easier to enable/disable
+			 * this during development.
+			 *
+			 * FIXME Only build the AM statistics, not the other stats. And only
+			 * do that for the keys with the optional procedure. not all of them.
+			 */
+			collectAmStats = false;
+			if (enable_indexam_stats && (Irel[ind]->rd_indam->amstatsprocnum != 0))
+			{
+				for (int j = 0; j < indexInfo->ii_NumIndexAttrs; j++)
+				{
+					regproc = index_getprocid(Irel[ind], (j+1), Irel[ind]->rd_indam->amstatsprocnum);
+					if (OidIsValid(regproc))
+					{
+						collectAmStats = true;
+						break;
+					}
+				}
+			}
+
+			if ((indexInfo->ii_Expressions != NIL || collectAmStats) && va_cols == NIL)
 			{
 				ListCell   *indexpr_item = list_head(indexInfo->ii_Expressions);
 
 				thisdata->vacattrstats = (VacAttrStats **)
-					palloc(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
+					palloc0(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
 				tcnt = 0;
 				for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
 				{
@@ -483,6 +520,12 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 						if (thisdata->vacattrstats[tcnt] != NULL)
 							tcnt++;
 					}
+					else
+					{
+						thisdata->vacattrstats[tcnt] =
+							examine_attribute(Irel[ind], i + 1, NULL);
+						tcnt++;
+					}
 				}
 				thisdata->attr_cnt = tcnt;
 			}
@@ -588,7 +631,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 
 		if (nindexes > 0)
 			compute_index_stats(onerel, totalrows,
-								indexdata, nindexes,
+								indexdata, Irel, nindexes,
 								rows, numrows,
 								col_context);
 
@@ -823,12 +866,93 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	anl_context = NULL;
 }
 
+/*
+ * compute_indexam_stats
+ *		Call the optional procedure to compute AM-specific statistics.
+ *
+ * We simply call the procedure, which is expected to produce a bytea value.
+ *
+ * At the moment this only deals with BRIN indexes, and bails out for other
+ * access methods, but it should be generic - use something like amoptsprocnum
+ * and just check if the procedure exists.
+ */
+static void
+compute_indexam_stats(Relation onerel,
+					  Relation indexRel, IndexInfo *indexInfo,
+					  double totalrows, AnlIndexData *indexdata,
+					  HeapTuple *rows, int numrows)
+{
+	int		expridx;
+
+	if (!enable_indexam_stats)
+		return;
+
+	/* ignore index AMs without the optional procedure */
+	if (indexRel->rd_indam->amstatsprocnum == 0)
+		return;
+
+	/*
+	 * Look at attributes, and calculate stats for those that have the
+	 * optional stats proc for the opfamily.
+	 */
+	expridx = 0;
+	for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+	{
+		AttrNumber		attno = (i + 1);
+		AttrNumber		attnum = indexInfo->ii_IndexAttrNumbers[i];	/* heap attnum */
+		RegProcedure	regproc;
+		FmgrInfo	   *statsproc;
+		Datum			datum;
+		VacAttrStats   *stats;
+		MemoryContext	oldcxt;
+		Node		   *expr = NULL;
+
+		if (!AttributeNumberIsValid(attnum))
+		{
+			expr = (Node *) list_nth(RelationGetIndexExpressions(indexRel),
+									 expridx);
+			expridx++;
+		}
+
+		/* do this first, as it doesn't fail when proc not defined */
+		regproc = index_getprocid(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+
+		/* ignore opclasses without the optional procedure */
+		if (!RegProcedureIsValid(regproc))
+			continue;
+
+		statsproc = index_getprocinfo(indexRel, attno, indexRel->rd_indam->amstatsprocnum);
+		Assert(statsproc != NULL);
+
+		stats = indexdata->vacattrstats[i];
+
+		oldcxt = MemoryContextSwitchTo(stats->anl_context);
+
+		/* call the proc, let the AM calculate whatever it wants */
+		/* XXX maybe we should just pass the index attno and leave the
+		 * expression handling up to the procedure? */
+		datum = FunctionCall7Coll(statsproc,
+								  InvalidOid, /* FIXME correct collation */
+								  PointerGetDatum(onerel),
+								  PointerGetDatum(indexRel),
+								  Int16GetDatum(attno),
+								  Int16GetDatum(attnum),
+								  PointerGetDatum(expr),
+								  PointerGetDatum(rows),
+								  Int32GetDatum(numrows));
+
+		stats->staindexam = datum;
+
+		MemoryContextSwitchTo(oldcxt);
+	}
+}
+
 /*
  * Compute statistics about indexes of a relation
  */
 static void
 compute_index_stats(Relation onerel, double totalrows,
-					AnlIndexData *indexdata, int nindexes,
+					AnlIndexData *indexdata, Relation *indexRels, int nindexes,
 					HeapTuple *rows, int numrows,
 					MemoryContext col_context)
 {
@@ -848,6 +972,7 @@ compute_index_stats(Relation onerel, double totalrows,
 	{
 		AnlIndexData *thisdata = &indexdata[ind];
 		IndexInfo  *indexInfo = thisdata->indexInfo;
+		Relation	indexRel = indexRels[ind];
 		int			attr_cnt = thisdata->attr_cnt;
 		TupleTableSlot *slot;
 		EState	   *estate;
@@ -860,6 +985,13 @@ compute_index_stats(Relation onerel, double totalrows,
 					rowno;
 		double		totalindexrows;
 
+		/*
+		 * If this is a BRIN index, try calling a procedure to collect
+		 * extra opfamily-specific statistics (if procedure defined).
+		 */
+		compute_indexam_stats(onerel, indexRel, indexInfo, totalrows,
+							  thisdata, rows, numrows);
+
 		/* Ignore index if no columns to analyze and not partial */
 		if (attr_cnt == 0 && indexInfo->ii_Predicate == NIL)
 			continue;
@@ -1662,6 +1794,13 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+
+		/* optional AM-specific stats */
+		if (DatumGetPointer(stats->staindexam) != NULL)
+			values[Anum_pg_statistic_staindexam - 1] = stats->staindexam;
+		else
+			nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 9f67a57724..3e66f6cb2c 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -2345,6 +2345,8 @@ serialize_expr_stats(AnlExprData *exprdata, int nexprs)
 		values[Anum_pg_statistic_stanullfrac - 1] = Float4GetDatum(stats->stanullfrac);
 		values[Anum_pg_statistic_stawidth - 1] = Int32GetDatum(stats->stawidth);
 		values[Anum_pg_statistic_stadistinct - 1] = Float4GetDatum(stats->stadistinct);
+		nulls[Anum_pg_statistic_staindexam - 1] = true;
+
 		i = Anum_pg_statistic_stakind1 - 1;
 		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
 		{
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index c4fcd0076e..1058a8d510 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7834,6 +7834,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	Relation	indexRel;
 	ListCell   *l;
 	VariableStatData vardata;
+	double		averageOverlaps;
 
 	Assert(rte->rtekind == RTE_RELATION);
 
@@ -7881,6 +7882,7 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	 * correlation statistics, we will keep it as 0.
 	 */
 	*indexCorrelation = 0;
+	averageOverlaps = 0.0;
 
 	foreach(l, path->indexclauses)
 	{
@@ -7890,6 +7892,36 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 		/* attempt to lookup stats in relation for this index column */
 		if (attnum != 0)
 		{
+			/*
+			 * If AM-specific statistics are enabled, try looking up the stats
+			 * for the index key. We only have this for minmax opclasses, so
+			 * we just cast it like that. But other BRIN opclasses might need
+			 * other stats so either we need to abstract this somehow, or maybe
+			 * just collect a sufficiently generic stats for all BRIN indexes.
+			 *
+			 * XXX Make this non-minmax specific.
+			 */
+			if (enable_indexam_stats)
+			{
+				BrinMinmaxStats  *amstats
+					= (BrinMinmaxStats *) get_attindexam(index->indexoid, attnum);
+
+				if (amstats)
+				{
+					elog(DEBUG1, "found AM stats: attnum %d n_ranges %lld n_summarized %lld n_all_nulls %lld n_has_nulls %lld avg_overlaps %f",
+						 attnum, (long long)amstats->n_ranges, (long long)amstats->n_summarized,
+						 (long long)amstats->n_all_nulls, (long long)amstats->n_has_nulls,
+						 amstats->avg_overlaps);
+
+					/*
+					 * The only thing we use at the moment is the average number
+					 * of overlaps for a single range. Use the other stuff too.
+					 */
+					averageOverlaps = Max(averageOverlaps,
+										  1.0 + amstats->avg_overlaps);
+				}
+			}
+
 			/* Simple variable -- look to stats for the underlying table */
 			if (get_relation_stats_hook &&
 				(*get_relation_stats_hook) (root, rte, attnum, &vardata))
@@ -7970,6 +8002,14 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 											 baserel->relid,
 											 JOIN_INNER, NULL);
 
+	/*
+	 * XXX Can we combine qualSelectivity with the average number of matching
+	 * ranges per value? qualSelectivity estimates how many tuples ar we
+	 * going to match, and average number of matches says how many ranges
+	 * will each of those match on average. We don't know how many will
+	 * be duplicate, but it gives us a worst-case estimate, at least.
+	 */
+
 	/*
 	 * Now calculate the minimum possible ranges we could match with if all of
 	 * the rows were in the perfect order in the table's heap.
@@ -7986,6 +8026,25 @@ brincostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
 	else
 		estimatedRanges = Min(minimalRanges / *indexCorrelation, indexRanges);
 
+	elog(DEBUG1, "before index AM stats: cestimatedRanges = %f", estimatedRanges);
+
+	/*
+	 * If we found some AM stats, look at average number of overlapping ranges,
+	 * and apply that to the currently estimated ranges.
+	 *
+	 * XXX We pretty much combine this with correlation info (because it was
+	 * already applied in the estimatedRanges formula above), which might be
+	 * overly pessimistic. The overlaps stats seems somewhat redundant with
+	 * the correlation, so maybe we should do just one? The AM stats seems
+	 * like a more reliable information, because the correlation is not very
+	 * sensitive to outliers, for example. So maybe let's prefer that, and
+	 * only use the correlation as fallback when AM stats are not available?
+	 */
+	if (averageOverlaps > 0.0)
+		estimatedRanges = Min(estimatedRanges * averageOverlaps, indexRanges);
+
+	elog(DEBUG1, "after index AM stats: cestimatedRanges = %f", estimatedRanges);
+
 	/* we expect to visit this portion of the table */
 	selec = estimatedRanges / indexRanges;
 
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 60978f9415..f8512e3dd3 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -3138,6 +3138,47 @@ get_attavgwidth(Oid relid, AttrNumber attnum)
 	return 0;
 }
 
+
+/*
+ * get_attstaindexam
+ *
+ *	  Given the table and attribute number of a column, get the index AM
+ *	  statistics.  Return NULL if no data available.
+ *
+ * Currently this is only consulted for individual tables, not for inheritance
+ * trees, so we don't need an "inh" parameter.
+ */
+bytea *
+get_attindexam(Oid relid, AttrNumber attnum)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache3(STATRELATTINH,
+						 ObjectIdGetDatum(relid),
+						 Int16GetDatum(attnum),
+						 BoolGetDatum(false));
+	if (HeapTupleIsValid(tp))
+	{
+		Datum	val;
+		bytea  *retval = NULL;
+		bool	isnull;
+
+		val = SysCacheGetAttr(STATRELATTINH, tp,
+							  Anum_pg_statistic_staindexam,
+							  &isnull);
+
+		if (!isnull)
+			retval = (bytea *) PG_DETOAST_DATUM(val);
+
+		// staindexam = ((Form_pg_statistic) GETSTRUCT(tp))->staindexam;
+		ReleaseSysCache(tp);
+
+		return retval;
+	}
+
+	return NULL;
+}
+
 /*
  * get_attstatsslot
  *
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c14456060c..e42f8348ea 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1025,6 +1025,16 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_indexam_stats", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of index AM stats."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_indexam_stats,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e4c0269fa3..9ca523073f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -379,6 +379,7 @@
 #enable_hashagg = on
 #enable_hashjoin = on
 #enable_incremental_sort = on
+#enable_indexam_stats = off
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
diff --git a/src/include/access/amapi.h b/src/include/access/amapi.h
index 4476ff7fba..e477e2e83c 100644
--- a/src/include/access/amapi.h
+++ b/src/include/access/amapi.h
@@ -216,6 +216,8 @@ typedef struct IndexAmRoutine
 	uint16		amsupport;
 	/* opclass options support function number or 0 */
 	uint16		amoptsprocnum;
+	/* opclass statistics support function number or 0 */
+	uint16		amstatsprocnum;
 	/* does AM support ORDER BY indexed column's value? */
 	bool		amcanorder;
 	/* does AM support ORDER BY result of an operator on indexed column? */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index ed66f1b3d5..1d21b816fc 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,6 +34,69 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinMinmaxStats
+{
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	int64		n_ranges;
+	int64		n_summarized;
+	int64		n_all_nulls;
+	int64		n_has_nulls;
+
+	/* average number of overlapping ranges */
+	double		avg_overlaps;
+
+	/* average number of matching ranges (per value) */
+	double		avg_matches;
+	double		avg_matches_unique;
+
+	/* minval/maxval stats (ndistinct, correlation to blkno) */
+	int64		minval_ndistinct;
+	int64		maxval_ndistinct;
+	double		minval_correlation;
+	double		maxval_correlation;
+
+	/* minval/maxval increment stats */
+	double		minval_increment_avg;
+	double		minval_increment_max;
+	double		maxval_increment_avg;
+	double		maxval_increment_max;
+
+} BrinMinmaxStats;
 
 #define BRIN_DEFAULT_PAGES_PER_RANGE	128
 #define BrinGetPagesPerRange(relation) \
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index 97ddc925b2..eac796e6f4 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -75,6 +75,7 @@ typedef struct BrinDesc
 #define BRIN_PROCNUM_OPTIONS 		5	/* optional */
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
+#define BRIN_PROCNUM_STATISTICS		11	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 5b950129de..9bbd1f14f1 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -804,6 +804,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -835,6 +837,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -864,6 +868,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -893,6 +899,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -905,6 +913,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -917,6 +927,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1034,6 +1046,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1062,6 +1076,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1110,6 +1126,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1160,6 +1178,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1173,6 +1194,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1261,6 +1285,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1314,6 +1341,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1366,6 +1396,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1436,6 +1468,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1467,6 +1502,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1517,6 +1554,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1530,6 +1570,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1542,6 +1585,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1668,6 +1713,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1721,6 +1769,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1771,6 +1822,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1785,6 +1838,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1799,6 +1855,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1851,6 +1910,8 @@
   amproc => 'brin_minmax_consistent' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1924,6 +1985,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '4',
   amproc => 'brin_minmax_union' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '11',
+  amproc => 'brin_minmax_stats' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6996073989..e50a21fc22 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8527,6 +8527,10 @@
 { oid => '3386', descr => 'BRIN minmax support',
   proname => 'brin_minmax_union', prorettype => 'bool',
   proargtypes => 'internal internal internal', prosrc => 'brin_minmax_union' },
+{ oid => '9800', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_stats', prorettype => 'bool',
+  proargtypes => 'internal internal int2 int2 internal int4',
+  prosrc => 'brin_minmax_stats' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 8770c5b4c6..d3d0bce257 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -121,6 +121,11 @@ CATALOG(pg_statistic,2619,StatisticRelationId)
 	anyarray	stavalues3;
 	anyarray	stavalues4;
 	anyarray	stavalues5;
+
+	/*
+	 * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).
+	 */
+	bytea		staindexam;
 #endif
 } FormData_pg_statistic;
 
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 4af02940c5..10a8ec83ce 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -151,6 +151,7 @@ typedef struct VacAttrStats
 	float4	   *stanumbers[STATISTIC_NUM_SLOTS];
 	int			numvalues[STATISTIC_NUM_SLOTS];
 	Datum	   *stavalues[STATISTIC_NUM_SLOTS];
+	Datum		staindexam;		/* index-specific stats (as bytea) */
 
 	/*
 	 * These fields describe the stavalues[n] element types. They will be
@@ -299,6 +300,7 @@ extern PGDLLIMPORT int vacuum_multixact_freeze_min_age;
 extern PGDLLIMPORT int vacuum_multixact_freeze_table_age;
 extern PGDLLIMPORT int vacuum_failsafe_age;
 extern PGDLLIMPORT int vacuum_multixact_failsafe_age;
+extern PGDLLIMPORT bool enable_indexam_stats;
 
 /*
  * Maximum value for default_statistics_target and per-column statistics
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 4f5418b972..fcef91d306 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -185,6 +185,7 @@ extern Oid	getBaseType(Oid typid);
 extern Oid	getBaseTypeAndTypmod(Oid typid, int32 *typmod);
 extern int32 get_typavgwidth(Oid typid, int32 typmod);
 extern int32 get_attavgwidth(Oid relid, AttrNumber attnum);
+extern bytea *get_attindexam(Oid relid, AttrNumber attnum);
 extern bool get_attstatsslot(AttStatsSlot *sslot, HeapTuple statstuple,
 							 int reqkind, Oid reqop, int flags);
 extern void free_attstatsslot(AttStatsSlot *sslot);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 001c6e7eb9..b7fda6fc82 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -117,6 +117,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashagg                 | on
  enable_hashjoin                | on
  enable_incremental_sort        | on
+ enable_indexam_stats           | off
  enable_indexonlyscan           | on
  enable_indexscan               | on
  enable_material                | on
@@ -132,7 +133,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(21 rows)
+(22 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.41.0

0002-wip-introduce-debug_brin_stats-20230710.patchtext/x-patch; charset=UTF-8; name=0002-wip-introduce-debug_brin_stats-20230710.patchDownload

From a1e5f3886434cda86976b127798aa56027757618 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 17:46:51 +0200
Subject: [PATCH 02/11] wip: introduce debug_brin_stats

---
 src/backend/access/brin/brin_minmax.c | 80 +++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c   | 18 ++++++
 2 files changed, 98 insertions(+)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 64150204fc..c66b6feafd 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -28,6 +28,10 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 
+#ifdef DEBUG_BRIN_STATS
+bool debug_brin_stats = false;
+#endif
+
 typedef struct MinmaxOpaque
 {
 	Oid			cached_subtype;
@@ -493,6 +497,13 @@ brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
 {
 	int64	noverlaps;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	noverlaps = 0;
 	for (int i = 0; i < nranges; i++)
 	{
@@ -515,6 +526,16 @@ brin_minmax_count_overlaps(BrinRange **minranges, int nranges,
 	 */
 	noverlaps *= 2;
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->avg_overlaps = (double) noverlaps / nranges;
 }
 
@@ -540,6 +561,13 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 
 	int64  *unique = (int64 *) palloc0(sizeof(int64) * nvalues);
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	/*
 	 * Build running count of unique values. We know there are unique[i]
 	 * unique values in values array up to index "i".
@@ -588,6 +616,18 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 	Assert(nmatches >= 0);
 	Assert(nmatches_unique >= 0);
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f", nmatches_unique, nvalues_unique,
+			(double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+									GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->avg_matches = (double) nmatches / numrows;
 	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
 }
@@ -619,6 +659,13 @@ brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
 			minval_corr = 0,
 			maxval_corr = 0;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	for (int i = 1; i < nranges; i++)
 	{
 		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
@@ -641,6 +688,19 @@ brin_minmax_value_stats(BrinRange **minranges, BrinRange **maxranges,
 
 	stats->minval_correlation = (double) minval_corr / nranges;
 	stats->maxval_correlation = (double) maxval_corr / nranges;
+
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_value_stats -----");
+		elog(WARNING, "minval ndistinct " INT64_FORMAT " correlation %f",
+			 stats->minval_ndistinct, stats->minval_correlation);
+		elog(WARNING, "maxval ndistinct " INT64_FORMAT " correlation %f",
+			 stats->maxval_ndistinct, stats->maxval_correlation);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
 }
 
 /*
@@ -665,6 +725,13 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 			max_minval = 0,
 			max_maxval = 0;
 
+#ifdef DEBUG_BRIN_STATS
+	TimestampTz		start_ts;
+
+	if (debug_brin_stats)
+		start_ts = GetCurrentTimestamp();
+#endif
+
 	for (int i = 1; i < nranges; i++)
 	{
 		if (range_values_cmp(&minranges[i-1]->min_value, &minranges[i]->min_value, typcache) != 0)
@@ -716,6 +783,19 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 		}
 	}
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_increment_stats -----");
+		elog(WARNING, "minval ndistinct %ld sum %f max %f avg %f",
+			 minval_ndist, sum_minval, max_minval, sum_minval / minval_ndist);
+		elog(WARNING, "maxval ndistinct %ld sum %f max %f avg %f",
+			 maxval_ndist, sum_maxval, max_maxval, sum_maxval / maxval_ndist);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+#endif
+
 	stats->minval_increment_avg = (sum_minval / minval_ndist);
 	stats->minval_increment_max = max_minval;
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index e42f8348ea..9711a23c34 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -98,6 +98,10 @@ extern char *temp_tablespaces;
 extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
 
+#ifdef DEBUG_BRIN_STATS
+extern bool debug_brin_stats;
+#endif
+
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
 #endif
@@ -1253,6 +1257,20 @@ struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+#ifdef DEBUG_BRIN_STATS
+	/* this is undocumented because not exposed in a standard build */
+	{
+		{"debug_brin_stats", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Print info about calculated BRIN statistics."),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&debug_brin_stats,
+		false,
+		NULL, NULL, NULL
+	},
+#endif
+
 	{
 		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Terminate session on any error."),
-- 
2.41.0

0003-wip-introduce-debug_brin_cross_check-20230710.patchtext/x-patch; charset=UTF-8; name=0003-wip-introduce-debug_brin_cross_check-20230710.patchDownload

From 385c5cd92007f7336efb9594752f21a1e27604e9 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 20:47:31 +0200
Subject: [PATCH 03/11] wip: introduce debug_brin_cross_check

---
 src/backend/access/brin/brin_minmax.c | 574 ++++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c   |  10 +
 2 files changed, 584 insertions(+)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index c66b6feafd..a4c00caafd 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -30,6 +30,7 @@
 
 #ifdef DEBUG_BRIN_STATS
 bool debug_brin_stats = false;
+bool debug_brin_cross_check = false;
 #endif
 
 typedef struct MinmaxOpaque
@@ -340,6 +341,50 @@ range_values_cmp(const void *a, const void *b, void *arg)
 	return DatumGetInt32(c);
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * maxval_start
+ *		Determine first index so that (maxvalue >= value).
+ *
+ * The array of ranges is expected to be sorted by maxvalue, so this is the first
+ * range that can possibly intersect with range having "value" as minval.
+ */
+static int
+maxval_start(BrinRange **ranges, int nranges, Datum value, TypeCacheEntry *typcache)
+{
+	int		start = 0,
+			end = (nranges - 1);
+
+	// everything matches
+	if (range_values_cmp(&value, &ranges[start]->max_value, typcache) <= 0)
+		return 0;
+
+	// no matches
+	if (range_values_cmp(&value, &ranges[end]->max_value, typcache) > 0)
+		return nranges;
+
+	while ((end - start) > 0)
+	{
+		int	 midpoint;
+		int	 r;
+
+		midpoint = start + (end - start) / 2;
+
+		r = range_values_cmp(&value, &ranges[midpoint]->max_value, typcache);
+
+		if (r <= 0)
+			end = midpoint;
+		else
+			start = (midpoint + 1);
+	}
+
+	Assert(ranges[start]->max_value >= value);
+	Assert(ranges[start-1]->max_value < value);
+
+	return start;
+}
+#endif
+
 /*
  * minval_end
  *		Determine first index so that (minval > value).
@@ -632,6 +677,316 @@ brin_minmax_match_tuples_to_ranges(BrinRanges *ranges,
 	stats->avg_matches_unique = (double) nmatches_unique / nvalues_unique;
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * Simple histogram, with bins tracking value and two overlap counts.
+ *
+ * XXX Maybe we should have two separate histograms, one for all counts and
+ * another one for "unique" values.
+ *
+ * XXX Serialize the histogram. There might be a data set where we have very
+ * many distinct buckets (values having very different number of matching
+ * ranges) - not sure if there's some sort of upper limit (but hard to say for
+ * other opclasses, like bloom). And we don't want arbitrarily large histogram,
+ * to keep the statistics fairly small, I guess. So we'd need to pick a subset,
+ * merge buckets with "similar" counts, or approximate it somehow. For now we
+ * don't serialize it, because we don't use the histogram.
+ */
+typedef struct histogram_bin_t
+{
+	int64	value;
+	int64	count;
+} histogram_bin_t;
+
+typedef struct histogram_t
+{
+	int				nbins;
+	int				nbins_max;
+	histogram_bin_t	bins[FLEXIBLE_ARRAY_MEMBER];
+} histogram_t;
+
+#define HISTOGRAM_BINS_START 32
+
+/* allocate histogram with default number of bins */
+static histogram_t *
+histogram_init(void)
+{
+	histogram_t *hist;
+
+	hist = (histogram_t *) palloc0(offsetof(histogram_t, bins) +
+								   sizeof(histogram_bin_t) * HISTOGRAM_BINS_START);
+	hist->nbins_max = HISTOGRAM_BINS_START;
+
+	return hist;
+}
+
+/*
+ * histogram_add
+ *			 Add a hit for a particular value to the histogram.
+ *
+ * XXX We don't sort the bins, so just do binary sort. For large number of values
+ * this might be an issue, for small number of values a linear search is fine.
+ */
+static histogram_t *
+histogram_add(histogram_t *hist, int value)
+{
+	bool	found = false;
+	histogram_bin_t *bin;
+
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		if (hist->bins[i].value == value)
+		{
+			bin = &hist->bins[i];
+			found = true;
+		}
+	}
+
+	if (!found)
+	{
+		if (hist->nbins == hist->nbins_max)
+		{
+			int		nbins = (2 * hist->nbins_max);
+
+			hist = repalloc(hist, offsetof(histogram_t, bins) +
+							sizeof(histogram_bin_t) * nbins);
+			hist->nbins_max = nbins;
+		}
+
+		Assert(hist->nbins < hist->nbins_max);
+
+		bin = &hist->bins[hist->nbins++];
+		bin->value = value;
+		bin->count = 0;
+	}
+
+	bin->count += 1;
+
+	Assert(bin->value == value);
+	Assert(bin->count >= 0);
+
+	return hist;
+}
+
+/* used to sort histogram bins by value */
+static int
+histogram_bin_cmp(const void *a, const void *b)
+{
+	histogram_bin_t *ba = (histogram_bin_t *) a;
+	histogram_bin_t *bb = (histogram_bin_t *) b;
+
+	if (ba->value < bb->value)
+		return -1;
+
+	if (bb->value < ba->value)
+		return 1;
+
+	return 0;
+}
+
+static void
+histogram_print(histogram_t *hist)
+{
+	return;
+
+	elog(WARNING, "----- histogram -----");
+	for (int i = 0; i < hist->nbins; i++)
+	{
+		elog(WARNING, "bin %d value %ld count %ld",
+				i, hist->bins[i].value, hist->bins[i].count);
+	}
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges2
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Match sample tuples to the ranges, so that we can count how many ranges
+ * a value matches on average. This might seem redundant to the number of
+ * overlaps, because the value is ~avg_overlaps/2.
+ *
+ * Imagine ranges arranged in "shifted" uniformly by 1/overlaps, e.g. with 3
+ * overlaps [0,100], [33,133], [66, 166] and so on. A random value will hit
+ * only half of there ranges, thus 1/2. This can be extended to randomly
+ * overlapping ranges.
+ *
+ * However, we may not be able to count overlaps for some opclasses (e.g. for
+ * bloom ranges), in which case we have at least this.
+ *
+ * This simply walks the values, and determines matching ranges by looking
+ * for lower/upper bound in ranges ordered by minval/maxval.
+ *
+ * XXX The other question is what to do about duplicate values. If we have a
+ * very frequent value in the sample, it's likely in many places/ranges. Which
+ * will skew the average, because it'll be added repeatedly. So we also count
+ * avg_ranges for unique values.
+ *
+ * XXX The relationship that (average_matches ~ average_overlaps/2) only
+ * works for minmax opclass, and can't be extended to minmax-multi. The
+ * overlaps can only consider the two extreme values (essentially treating
+ * the summary as a single minmax range), because that's what brinsort
+ * needs. But the minmax-multi range may have "gaps" (kinda the whole point
+ * of these opclasses), which affects matching tuples to ranges.
+ *
+ * XXX This also builds histograms of the number of matches, both for the
+ * raw and unique values. At the moment we don't do anything with the
+ * results, though (except for printing those).
+ */
+static void
+brin_minmax_match_tuples_to_ranges2(BrinRanges *ranges,
+									BrinRange **minranges, BrinRange **maxranges,
+									int numrows, HeapTuple *rows,
+									int nvalues, Datum *values,
+									TypeCacheEntry *typcache,
+									BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nmatches_value = 0;
+	int64	nvalues_unique = 0;
+
+	histogram_t	   *hist = histogram_init();
+	histogram_t	   *hist_unique = histogram_init();
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		int		start;
+		int		end;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/*
+		 * Same value as preceding, so just use the preceding count.
+		 * We don't increment the unique counters, because this is
+		 * a duplicate.
+		 */
+		if ((i > 0) && (range_values_cmp(&values[i-1], &values[i], typcache) == 0))
+		{
+			nmatches += nmatches_value;
+			hist = histogram_add(hist, nmatches_value);
+			continue;
+		}
+
+		nmatches_value = 0;
+
+		start = maxval_start(maxranges, ranges->nranges, values[i], typcache);
+		end = minval_end(minranges, ranges->nranges, values[i], typcache);
+
+		for (int j = start; j < ranges->nranges; j++)
+		{
+			if (maxranges[j]->min_index >= end)
+				continue;
+
+			if (maxranges[j]->min_index_lowest >= end)
+				break;
+
+			nmatches_value++;
+		}
+
+		hist = histogram_add(hist, nmatches_value);
+		hist_unique = histogram_add(hist_unique, nmatches_value);
+
+		nmatches += nmatches_value;
+		nmatches_unique += nmatches_value;
+		nvalues_unique++;
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges2 -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f",
+			 nmatches_unique, nvalues_unique, (double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_matches != (double) nmatches / numrows)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges2: avg_matches mismatch %f != %f",
+			 stats->avg_matches, (double) nmatches / numrows);
+
+	if (stats->avg_matches_unique != (double) nmatches_unique / nvalues_unique)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges2: avg_matches_unique mismatch %f != %f",
+			 stats->avg_matches_unique, (double) nmatches_unique / nvalues_unique);
+
+	pg_qsort(hist->bins, hist->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+	pg_qsort(hist_unique->bins, hist_unique->nbins, sizeof(histogram_bin_t), histogram_bin_cmp);
+
+	histogram_print(hist);
+	histogram_print(hist_unique);
+
+	pfree(hist);
+	pfree(hist_unique);
+}
+
+/*
+ * brin_minmax_match_tuples_to_ranges_bruteforce
+ *		Match tuples to ranges, count average number of ranges per tuple.
+ *
+ * Bruteforce approach, used mostly for cross-checking.
+ */
+static void
+brin_minmax_match_tuples_to_ranges_bruteforce(BrinRanges *ranges,
+											  int numrows, HeapTuple *rows,
+											  int nvalues, Datum *values,
+											  TypeCacheEntry *typcache,
+											  BrinMinmaxStats *stats)
+{
+	int64	nmatches = 0;
+	int64	nmatches_unique = 0;
+	int64	nvalues_unique = 0;
+
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	for (int i = 0; i < nvalues; i++)
+	{
+		bool	is_unique;
+		int64	nmatches_value = 0;
+
+		CHECK_FOR_INTERRUPTS();
+
+		/* is this a new value? */
+		is_unique = ((i == 0) || (range_values_cmp(&values[i-1], &values[i], typcache) != 0));
+
+		/* count unique values */
+		nvalues_unique += (is_unique) ? 1 : 0;
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			if (range_values_cmp(&values[i], &ranges->ranges[j].min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&values[i], &ranges->ranges[j].max_value, typcache) > 0)
+				continue;
+
+			nmatches_value++;
+		}
+
+		nmatches += nmatches_value;
+		nmatches_unique += (is_unique) ? nmatches_value : 0;
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_match_tuples_to_ranges_bruteforce -----");
+		elog(WARNING, "nmatches = %ld %f", nmatches, (double) nmatches / numrows);
+		elog(WARNING, "nmatches unique = %ld %ld %f", nmatches_unique, nvalues_unique,
+			 (double) nmatches_unique / nvalues_unique);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_matches != (double) nmatches / numrows)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges_bruteforce: avg_matches mismatch %f != %f",
+			 stats->avg_matches, (double) nmatches / numrows);
+
+	if (stats->avg_matches_unique != (double) nmatches_unique / nvalues_unique)
+		elog(ERROR, "brin_minmax_match_tuples_to_ranges_bruteforce: avg_matches_unique mismatch %f != %f",
+			 stats->avg_matches_unique, (double) nmatches_unique / nvalues_unique);
+}
+#endif
+
 /*
  * brin_minmax_value_stats
  *		Calculate statistics about minval/maxval values.
@@ -803,6 +1158,198 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
 	stats->maxval_increment_max = max_maxval;
 }
 
+#ifdef DEBUG_BRIN_STATS
+/*
+ * brin_minmax_count_overlaps2
+ *		Calculate number of overlaps.
+ *
+ * This uses the minranges/maxranges to quickly eliminate ranges that can't
+ * possibly intersect.
+ *
+ * XXX Seems rather complicated and works poorly for wide ranges (with outlier
+ * values), brin_minmax_count_overlaps is likely better.
+ */
+static void
+brin_minmax_count_overlaps2(BrinRanges *ranges,
+						   BrinRange **minranges, BrinRange **maxranges,
+						   TypeCacheEntry *typcache, BrinMinmaxStats *stats)
+{
+	int64			noverlaps;
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Walk the ranges ordered by max_values, see how many ranges overlap.
+	 *
+	 * Once we get to a state where (min_value > current.max_value) for
+	 * all future ranges, we know none of them can overlap and we can
+	 * terminate. This is what min_index_lowest is for.
+	 *
+	 * XXX If there are very wide ranges (with outlier min/max values),
+	 * the min_index_lowest is going to be pretty useless, because the
+	 * range will be sorted at the very end by max_value, but will have
+	 * very low min_index, so this won't work.
+	 *
+	 * XXX We could collect a more elaborate stuff, like for example a
+	 * histogram of number of overlaps, or maximum number of overlaps.
+	 * So we'd have average, but then also an info if there are some
+	 * ranges with very many overlaps.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		int			idx = (i + 1);
+		BrinRange *ra = maxranges[i];
+		uint64		min_index = ra->min_index;
+
+		CHECK_FOR_INTERRUPTS();
+
+#ifdef NOT_USED
+		/*
+		 * XXX Not needed, we can just count "future" ranges and then
+		 * we just multiply by 2.
+		 */
+
+		/*
+		 * What's the first range that might overlap with this one?
+		 * needs to have maxval > current.minval.
+		 */
+		while (idx > 0)
+		{
+			BrinRange *rb = maxranges[idx - 1];
+
+			/* the range is before the current one, so can't intersect */
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				break;
+
+			idx--;
+		}
+#endif
+
+		/*
+		 * Find the first min_index that is higher than the max_value,
+		 * so that we can compare that instead of the values in the
+		 * next loop. There should be fewer value comparisons than in
+		 * the next loop, so we'll save on function calls.
+		 */
+		while (min_index < ranges->nranges)
+		{
+			if (range_values_cmp(&minranges[min_index]->min_value,
+								 &ra->max_value, typcache) > 0)
+				break;
+
+			min_index++;
+		}
+
+		/*
+		 * Walk the following ranges (ordered by max_value), and check
+		 * if it overlaps. If it matches, we look at the next one. If
+		 * not, we check if there can be more ranges.
+		 */
+		for (int j = idx; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = maxranges[j];
+
+			/* the range overlaps - just continue with the next one */
+			// if (range_values_cmp(&rb->min_value, &ra->max_value, typcache) <= 0)
+			if (rb->min_index < min_index)
+			{
+				noverlaps++;
+				continue;
+			}
+
+			/*
+			 * Are there any future ranges that might overlap? We can
+			 * check the min_index_lowest to decide quickly.
+			 */
+			 if (rb->min_index_lowest >= min_index)
+					break;
+		}
+	}
+
+	/*
+	 * We only count intersect for "following" ranges when ordered by maxval,
+	 * so we only see 1/2 the overlaps. So double the result.
+	 */
+	noverlaps *= 2;
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps2 -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_overlaps != (double) noverlaps / ranges->nranges)
+		elog(ERROR, "brin_minmax_count_overlaps2: mismatch %f != %f",
+			 stats->avg_overlaps, (double) noverlaps / ranges->nranges);
+}
+
+/*
+ * brin_minmax_count_overlaps_bruteforce
+ *		Calculate number of overlaps by brute force.
+ *
+ * Actually compares every range to every other range. Quite expensive, used
+ * primarily to cross-check the other algorithms.
+ */
+static void
+brin_minmax_count_overlaps_bruteforce(BrinRanges *ranges,
+									  TypeCacheEntry *typcache,
+									  BrinMinmaxStats *stats)
+{
+	int64			noverlaps;
+	TimestampTz		start_ts = GetCurrentTimestamp();
+
+	/*
+	 * Brute force calculation of overlapping ranges, comparing each
+	 * range to every other range - bound to be pretty expensive, as
+	 * it's pretty much O(N^2). Kept mostly for easy cross-check with
+	 * the preceding "optimized" code.
+	 */
+	noverlaps = 0;
+	for (int i = 0; i < ranges->nranges; i++)
+	{
+		BrinRange *ra = &ranges->ranges[i];
+
+		for (int j = 0; j < ranges->nranges; j++)
+		{
+			BrinRange *rb = &ranges->ranges[j];
+
+			CHECK_FOR_INTERRUPTS();
+
+			if (i == j)
+				continue;
+
+			if (range_values_cmp(&ra->max_value, &rb->min_value, typcache) < 0)
+				continue;
+
+			if (range_values_cmp(&rb->max_value, &ra->min_value, typcache) < 0)
+				continue;
+
+#if 0
+			elog(DEBUG1, "[%ld,%ld] overlaps [%ld,%ld]",
+				 ra->min_value, ra->max_value,
+				 rb->min_value, rb->max_value);
+#endif
+
+			noverlaps++;
+		}
+	}
+
+	if (debug_brin_stats)
+	{
+		elog(WARNING, "----- brin_minmax_count_overlaps_bruteforce -----");
+		elog(WARNING, "noverlaps = %ld", noverlaps);
+		elog(WARNING, "duration = %ld", TimestampDifferenceMilliseconds(start_ts,
+										GetCurrentTimestamp()));
+	}
+
+	if (stats->avg_overlaps != (double) noverlaps / ranges->nranges)
+		elog(ERROR, "brin_minmax_count_overlaps2: mismatch %f != %f",
+			 stats->avg_overlaps, (double) noverlaps / ranges->nranges);
+}
+#endif
+
 /*
  * brin_minmax_stats
  *		Calculate custom statistics for a BRIN minmax index.
@@ -814,6 +1361,11 @@ brin_minmax_increment_stats(BrinRange **minranges, BrinRange **maxranges,
  *  - average number of rows matching a range
  *  - number of distinct minval/maxval values
  *
+ * There are multiple ways to calculate some of the metrics, so to allow
+ * cross-checking during development it's possible to run and compare all.
+ * To do that, define STATS_CROSS_CHECK. There's also STATS_DEBUG define
+ * that simply prints the calculated results.
+ *
  * XXX This could also calculate correlation of the range minval, so that
  * we can estimate how much random I/O will happen during the BrinSort.
  * And perhaps we should also sort the ranges by (minval,block_start) to
@@ -1141,6 +1693,14 @@ brin_minmax_stats(PG_FUNCTION_ARGS)
 	/* calculate average number of overlapping ranges for any range */
 	brin_minmax_count_overlaps(minranges, ranges->nranges, typcache, stats);
 
+#ifdef DEBUG_BRIN_STATS
+	if (debug_brin_cross_check)
+	{
+		brin_minmax_count_overlaps2(ranges, minranges, maxranges, typcache, stats);
+		brin_minmax_count_overlaps_bruteforce(ranges, typcache, stats);
+	}
+#endif
+
 	/* calculate minval/maxval stats (distinct values and correlation) */
 	brin_minmax_value_stats(minranges, maxranges,
 							ranges->nranges, typcache, stats);
@@ -1206,6 +1766,20 @@ brin_minmax_stats(PG_FUNCTION_ARGS)
 										   numrows, rows, nvalues, values,
 										   typcache, stats);
 
+#ifdef DEBUG_BRIN_STATS
+		if (debug_brin_cross_check)
+		{
+			brin_minmax_match_tuples_to_ranges2(ranges, minranges, maxranges,
+												numrows, rows, nvalues, values,
+												typcache, stats);
+
+			brin_minmax_match_tuples_to_ranges_bruteforce(ranges,
+														  numrows, rows,
+														  nvalues, values,
+														  typcache, stats);
+		}
+#endif
+
 		brin_minmax_increment_stats(minranges, maxranges, ranges->nranges,
 									values, nvalues, typcache, stats);
 	}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 9711a23c34..2fa7ac0052 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -100,6 +100,7 @@ extern bool ignore_invalid_pages;
 
 #ifdef DEBUG_BRIN_STATS
 extern bool debug_brin_stats;
+extern bool debug_brin_cross_check;
 #endif
 
 #ifdef TRACE_SYNCSCAN
@@ -1269,6 +1270,15 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"debug_brin_cross_check", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Cross-check calculation of BRIN statistics."),
+			NULL
+		},
+		&debug_brin_cross_check,
+		false,
+		NULL, NULL, NULL
+	},
 #endif
 
 	{
-- 
2.41.0

0004-Allow-BRIN-indexes-to-produce-sorted-output-20230710.patchtext/x-patch; charset=UTF-8; name=0004-Allow-BRIN-indexes-to-produce-sorted-output-20230710.patchDownload

From 1cb497ba5ace624194e2c36f251aad9e1259621a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 9 Oct 2022 11:33:37 +0200
Subject: [PATCH 04/11] Allow BRIN indexes to produce sorted output

Some BRIN indexes can be used to produce sorted output, by using the
range information to sort tuples incrementally. This is particularly
interesting for LIMIT queries, which only need to scan the first few
rows, and alternative plans (e.g. Seq Scan + Sort) have a very high
startup cost.

Of course, if there are e.g. BTREE indexes this is going to be slower,
but people are unlikely to have both index types on the same column.

This is disabled by default, use enable_brinsort GUC to enable it.
---
 src/backend/access/brin/brin_minmax.c         |  402 ++++
 src/backend/commands/explain.c                |   44 +
 src/backend/executor/Makefile                 |    1 +
 src/backend/executor/execProcnode.c           |   10 +
 src/backend/executor/meson.build              |    1 +
 src/backend/executor/nodeBrinSort.c           | 1612 +++++++++++++++++
 src/backend/optimizer/path/costsize.c         |  254 +++
 src/backend/optimizer/path/indxpath.c         |  183 ++
 src/backend/optimizer/path/pathkeys.c         |   49 +
 src/backend/optimizer/plan/createplan.c       |  260 +++
 src/backend/optimizer/plan/setrefs.c          |   19 +
 src/backend/optimizer/util/pathnode.c         |   57 +
 src/backend/utils/misc/guc_tables.c           |   28 +
 src/backend/utils/misc/postgresql.conf.sample |    1 +
 src/backend/utils/sort/tuplesort.c            |   12 +
 src/include/access/brin.h                     |   35 -
 src/include/access/brin_internal.h            |    1 +
 src/include/catalog/pg_amproc.dat             |   64 +
 src/include/catalog/pg_proc.dat               |    3 +
 src/include/executor/nodeBrinSort.h           |   47 +
 src/include/nodes/execnodes.h                 |  108 ++
 src/include/nodes/pathnodes.h                 |   11 +
 src/include/nodes/plannodes.h                 |   32 +
 src/include/optimizer/cost.h                  |    3 +
 src/include/optimizer/pathnode.h              |    9 +
 src/include/optimizer/paths.h                 |    3 +
 src/include/utils/tuplesort.h                 |    1 +
 src/test/regress/expected/sysviews.out        |    3 +-
 28 files changed, 3217 insertions(+), 36 deletions(-)
 create mode 100644 src/backend/executor/nodeBrinSort.c
 create mode 100644 src/include/executor/nodeBrinSort.h

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index a4c00caafd..162650cf73 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -16,6 +16,10 @@
 #include "access/brin_tuple.h"
 #include "access/genam.h"
 #include "access/stratnum.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
 #include "catalog/pg_amop.h"
 #include "catalog/pg_type.h"
 #include "executor/executor.h"
@@ -43,6 +47,9 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
 
+/* print info about ranges */
+#define BRINSORT_DEBUG
+
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1809,6 +1816,401 @@ cleanup:
 	PG_RETURN_POINTER(stats);
 }
 
+/*
+ * brin_minmax_range_tupdesc
+ *		Create a tuple descriptor to store BrinRange data.
+ */
+static TupleDesc
+brin_minmax_range_tupdesc(BrinDesc *brdesc, AttrNumber attnum)
+{
+	TupleDesc	tupdesc;
+	AttrNumber	attno = 1;
+
+	/* expect minimum and maximum */
+	Assert(brdesc->bd_info[attnum - 1]->oi_nstored == 2);
+
+	tupdesc = CreateTemplateTupleDesc(7);
+
+	/* blkno_start */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT4OID, -1, 0);
+
+	/* blkno_end (could be calculated as blkno_start + pages_per_range) */
+	TupleDescInitEntry(tupdesc, attno++, NULL, INT4OID, -1, 0);
+
+	/* has_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* all_nulls */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* not_summarized */
+	TupleDescInitEntry(tupdesc, attno++, NULL, BOOLOID, -1, 0);
+
+	/* min_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	/* max_value */
+	TupleDescInitEntry(tupdesc, attno++, NULL,
+					   brdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id,
+								   -1, 0);
+
+	return tupdesc;
+}
+
+/*
+ * brin_minmax_scan_init
+ *		Prepare the BrinRangeScanDesc including the sorting info etc.
+ *
+ * We want to have the ranges in roughly this order
+ *
+ * - not-summarized
+ * - summarized, non-null values
+ * - summarized, all-nulls
+ *
+ * We do it this way, because the not-summarized ranges need to be
+ * scanned always (both to produce NULL and non-NULL values), and
+ * we need to read all of them into the tuplesort before producing
+ * anything. So placing them at the beginning is reasonable.
+ *
+ * The all-nulls ranges are placed last, because when processing
+ * NULLs we need to scan everything anyway (some of the ranges might
+ * have has_nulls=true). But for non-NULL values we can abort once
+ * we hit the first all-nulls range.
+ *
+ * The regular ranges are sorted by blkno_start, to make it maybe
+ * a bit more sequential (but this only helps if there are ranges
+ * with the same minval).
+ */
+static BrinRangeScanDesc *
+brin_minmax_scan_init(BrinDesc *bdesc, Oid collation, AttrNumber attnum, bool asc)
+{
+	BrinRangeScanDesc  *scan;
+
+	/* sort by (not_summarized, minval, blkno_start, all_nulls) */
+	AttrNumber			keys[4];
+	Oid					collations[4];
+	bool				nullsFirst[4];
+	Oid					operators[4];
+	Oid					typid;
+	TypeCacheEntry	   *typcache;
+
+	/* we expect to have min/max value for each range, same type for both */
+	Assert(bdesc->bd_info[attnum - 1]->oi_nstored == 2);
+	Assert(bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id ==
+		   bdesc->bd_info[attnum - 1]->oi_typcache[1]->type_id);
+
+	scan = (BrinRangeScanDesc *) palloc0(sizeof(BrinRangeScanDesc));
+
+	/* build tuple descriptor for range data */
+	scan->tdesc = brin_minmax_range_tupdesc(bdesc, attnum);
+
+	/* initialize ordering info */
+	keys[0] = 5;				/* not_summarized */
+	keys[1] = 4;				/* all_nulls */
+	keys[2] = (asc) ? 6 : 7;	/* min_value (asc) or max_value (desc) */
+	keys[3] = 1;				/* blkno_start */
+
+	collations[0] = InvalidOid;	/* FIXME */
+	collations[1] = InvalidOid;	/* FIXME */
+	collations[2] = collation;	/* FIXME */
+	collations[3] = InvalidOid;	/* FIXME */
+
+	/* unrelated to the ordering desired by the user */
+	nullsFirst[0] = false;
+	nullsFirst[1] = false;
+	nullsFirst[2] = false;
+	nullsFirst[3] = false;
+
+	/* lookup sort operator for the boolean type (used for not_summarized) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_GT_OPR);
+	operators[0] = typcache->gt_opr;
+
+	/* lookup sort operator for the boolean type (used for all_nulls) */
+	typcache = lookup_type_cache(BOOLOID, TYPECACHE_LT_OPR);
+	operators[1] = typcache->lt_opr;
+
+	/* lookup sort operator for the min/max type */
+	typid = bdesc->bd_info[attnum - 1]->oi_typcache[0]->type_id;
+	typcache = lookup_type_cache(typid, TYPECACHE_LT_OPR | TYPECACHE_GT_OPR);
+	operators[2] = (asc) ? typcache->lt_opr : typcache->gt_opr;
+
+	/* lookup sort operator for the bigint type (used for blkno_start) */
+	typcache = lookup_type_cache(INT4OID, TYPECACHE_LT_OPR);
+	operators[3] = typcache->lt_opr;
+
+	/*
+	 * XXX better to keep this small enough to fit into L2/L3, large values
+	 * of work_mem may easily make this slower.
+	 */
+	scan->ranges = tuplesort_begin_heap(scan->tdesc,
+										4, /* nkeys */
+										keys,
+										operators,
+										collations,
+										nullsFirst,
+										work_mem,
+										NULL,
+										TUPLESORT_RANDOMACCESS);
+
+	scan->slot = MakeSingleTupleTableSlot(scan->tdesc,
+										  &TTSOpsMinimalTuple);
+
+	return scan;
+}
+
+/*
+ * brin_minmax_scan_add_tuple
+ *		Form and store a tuple representing the BRIN range to the tuplestore.
+ */
+static void
+brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan, TupleTableSlot *slot,
+						   BlockNumber block_start, BlockNumber block_end,
+						   bool has_nulls, bool all_nulls, bool not_summarized,
+						   Datum min_value, Datum max_value)
+{
+	ExecClearTuple(slot);
+
+	memset(slot->tts_isnull, false, 7 * sizeof(bool));
+
+	slot->tts_values[0] = UInt32GetDatum(block_start);
+	slot->tts_values[1] = UInt32GetDatum(block_end);
+	slot->tts_values[2] = BoolGetDatum(has_nulls);
+	slot->tts_values[3] = BoolGetDatum(all_nulls);
+	slot->tts_values[4] = BoolGetDatum(not_summarized);
+	slot->tts_values[5] = min_value;
+	slot->tts_values[6] = max_value;
+
+	if (all_nulls || not_summarized)
+	{
+		slot->tts_isnull[5] = true;
+		slot->tts_isnull[6] = true;
+	}
+
+	ExecStoreVirtualTuple(slot);
+
+	tuplesort_puttupleslot(scan->ranges, slot);
+
+	scan->nranges++;
+}
+
+#ifdef BRINSORT_DEBUG
+/*
+ * brin_minmax_scan_next
+ *		Return the next BRIN range information from the tuplestore.
+ *
+ * Returns NULL when there are no more ranges.
+ */
+static BrinRange *
+brin_minmax_scan_next(BrinRangeScanDesc *scan)
+{
+	if (tuplesort_gettupleslot(scan->ranges, true, false, scan->slot, NULL))
+	{
+		bool		isnull;
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = slot_getattr(scan->slot, 1, &isnull);
+		range->blkno_end = slot_getattr(scan->slot, 2, &isnull);
+		range->has_nulls = slot_getattr(scan->slot, 3, &isnull);
+		range->all_nulls = slot_getattr(scan->slot, 4, &isnull);
+		range->not_summarized = slot_getattr(scan->slot, 5, &isnull);
+		range->min_value = slot_getattr(scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(scan->slot, 7, &isnull);
+
+		return range;
+	}
+
+	return NULL;
+}
+
+/*
+ * brin_minmax_scan_dump
+ *		Print info about all page ranges stored in the tuplestore.
+ */
+static void
+brin_minmax_scan_dump(BrinRangeScanDesc *scan)
+{
+	BrinRange *range;
+
+	if (!message_level_is_interesting(WARNING))
+		return;
+
+	elog(WARNING, "===== dumping =====");
+	while ((range = brin_minmax_scan_next(scan)) != NULL)
+	{
+		elog(WARNING, "[%u %u] has_nulls %d all_nulls %d not_summarized %d values [%ld %ld]",
+			 range->blkno_start, range->blkno_end,
+			 range->has_nulls, range->all_nulls, range->not_summarized,
+			 range->min_value, range->max_value);
+
+		pfree(range);
+	}
+
+	/* reset the tuplestore, so that we can start scanning again */
+	tuplesort_rescan(scan->ranges);
+}
+#endif
+
+static void
+brin_minmax_scan_finalize(BrinRangeScanDesc *scan)
+{
+	tuplesort_performsort(scan->ranges);
+}
+
+/*
+ * brin_minmax_ranges
+ *		Load the BRIN ranges and sort them.
+ */
+Datum
+brin_minmax_ranges(PG_FUNCTION_ARGS)
+{
+	IndexScanDesc	scan = (IndexScanDesc) PG_GETARG_POINTER(0);
+	AttrNumber		attnum = PG_GETARG_INT16(1);
+	bool			asc = PG_GETARG_BOOL(2);
+	Oid				colloid = PG_GET_COLLATION();
+	BrinOpaque *opaque;
+	Relation	indexRel;
+	Relation	heapRel;
+	BlockNumber nblocks;
+	BlockNumber	heapBlk;
+	Oid			heapOid;
+	BrinMemTuple *dtup;
+	BrinTuple  *btup = NULL;
+	Size		btupsz = 0;
+	Buffer		buf = InvalidBuffer;
+	BlockNumber	pagesPerRange;
+	BrinDesc	   *bdesc;
+	BrinRangeScanDesc *brscan;
+	TupleTableSlot *slot;
+
+	/*
+	 * Determine how many BRIN ranges could there be, allocate space and read
+	 * all the min/max values.
+	 */
+	opaque = (BrinOpaque *) scan->opaque;
+	bdesc = opaque->bo_bdesc;
+	pagesPerRange = opaque->bo_pagesPerRange;
+
+	indexRel = bdesc->bd_index;
+
+	/* make sure the provided attnum is valid */
+	Assert((attnum > 0) && (attnum <= bdesc->bd_tupdesc->natts));
+
+	/*
+	 * We need to know the size of the table so that we know how long to iterate
+	 * on the revmap (and to pre-allocate the arrays).
+	 */
+	heapOid = IndexGetRelation(RelationGetRelid(indexRel), false);
+	heapRel = table_open(heapOid, AccessShareLock);
+	nblocks = RelationGetNumberOfBlocks(heapRel);
+	table_close(heapRel, AccessShareLock);
+
+	/* allocate an initial in-memory tuple, out of the per-range memcxt */
+	dtup = brin_new_memtuple(bdesc);
+
+	/* initialize the scan describing scan of ranges sorted by minval */
+	brscan = brin_minmax_scan_init(bdesc, colloid, attnum, asc);
+
+	slot = MakeSingleTupleTableSlot(brscan->tdesc, &TTSOpsVirtual);
+
+	/*
+	 * Now scan the revmap.  We start by querying for heap page 0,
+	 * incrementing by the number of pages per range; this gives us a full
+	 * view of the table.
+	 *
+	 * XXX The sort may be quite expensive, e.g. for small BRIN ranges. Maybe
+	 * we could optimize this somehow? For example, we know the not-summarized
+	 * ranges are always going to be first, and all-null ranges last, so maybe
+	 * we could stash those somewhere, and not sort them? But there are likely
+	 * only very few such ranges, in most cases. Moreover, how would we then
+	 * prepend/append those ranges to the sorted ones? Probably would have to
+	 * store them in a tuplestore, or something.
+	 *
+	 * XXX Seems that having large work_mem can be quite detrimental, because
+	 * then it overflows L2/L3 caches, making the sort much slower.
+	 *
+	 * XXX If there are other indexes, would be great to filter the ranges, so
+	 * that we only sort the interesting ones - reduces the number of ranges,
+	 * makes the sort faster.
+	 *
+	 * XXX Another option is making this incremental - e.g. only ask for the
+	 * first 1000 ranges, using a top-N sort. And then if it's not enough we
+	 * could request another chunk. But the second request would have to be
+	 * rather unlikely (because quite expensive), and the top-N sort does not
+	 * seem all that faster (as long as we don't overflow L2/L3).
+	 */
+	for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
+	{
+		bool		gottuple = false;
+		BrinTuple  *tup;
+		OffsetNumber off;
+		Size		size;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tup = brinGetTupleForHeapBlock(opaque->bo_rmAccess, heapBlk, &buf,
+									   &off, &size, BUFFER_LOCK_SHARE,
+									   scan->xs_snapshot);
+		if (tup)
+		{
+			gottuple = true;
+			btup = brin_copy_tuple(tup, size, btup, &btupsz);
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+		}
+
+		/*
+		 * Ranges with no indexed tuple may contain anything.
+		 */
+		if (!gottuple)
+		{
+			brin_minmax_scan_add_tuple(brscan, slot,
+									   heapBlk, heapBlk + (pagesPerRange - 1),
+									   false, false, true, 0, 0);
+		}
+		else
+		{
+			dtup = brin_deform_tuple(bdesc, btup, dtup);
+			if (dtup->bt_placeholder)
+			{
+				/*
+				 * Placeholder tuples are treated as if not summarized.
+				 *
+				 * XXX Is this correct?
+				 */
+				brin_minmax_scan_add_tuple(brscan, slot,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   false, false, true, 0, 0);
+			}
+			else
+			{
+				BrinValues *bval;
+
+				bval = &dtup->bt_columns[attnum - 1];
+
+				brin_minmax_scan_add_tuple(brscan, slot,
+										   heapBlk, heapBlk + (pagesPerRange - 1),
+										   bval->bv_hasnulls, bval->bv_allnulls, false,
+										   bval->bv_values[0], bval->bv_values[1]);
+			}
+		}
+	}
+
+	ExecDropSingleTupleTableSlot(slot);
+
+	if (buf != InvalidBuffer)
+		ReleaseBuffer(buf);
+
+	/* do the sort and any necessary post-processing */
+	brin_minmax_scan_finalize(brscan);
+
+#ifdef BRINSORT_DEBUG
+	brin_minmax_scan_dump(brscan);
+#endif
+
+	PG_RETURN_POINTER(brscan);
+}
+
 /*
  * Cache and return the procedure for the given strategy.
  *
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..d2eae9ea1f 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -85,6 +85,8 @@ static void show_sort_keys(SortState *sortstate, List *ancestors,
 						   ExplainState *es);
 static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
+static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
+							   ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1111,6 +1113,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
 		case T_IndexScan:
 		case T_IndexOnlyScan:
 		case T_BitmapHeapScan:
+		case T_BrinSort:
 		case T_TidScan:
 		case T_TidRangeScan:
 		case T_SubqueryScan:
@@ -1273,6 +1276,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_IndexOnlyScan:
 			pname = sname = "Index Only Scan";
 			break;
+		case T_BrinSort:
+			pname = sname = "BRIN Sort";
+			break;
 		case T_BitmapIndexScan:
 			pname = sname = "Bitmap Index Scan";
 			break;
@@ -1519,6 +1525,16 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainScanTarget((Scan *) indexonlyscan, es);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *brinsort = (BrinSort *) plan;
+
+				ExplainIndexScanDetails(brinsort->indexid,
+										brinsort->indexorderdir,
+										es);
+				ExplainScanTarget((Scan *) brinsort, es);
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *bitmapindexscan = (BitmapIndexScan *) plan;
@@ -1804,6 +1820,18 @@ ExplainNode(PlanState *planstate, List *ancestors,
 				ExplainPropertyFloat("Heap Fetches", NULL,
 									 planstate->instrument->ntuples2, 0, es);
 			break;
+		case T_BrinSort:
+			show_scan_qual(((BrinSort *) plan)->indexqualorig,
+						   "Index Cond", planstate, ancestors, es);
+			if (((BrinSort *) plan)->indexqualorig)
+				show_instrumentation_count("Rows Removed by Index Recheck", 2,
+										   planstate, es);
+			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
+			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			if (plan->qual)
+				show_instrumentation_count("Rows Removed by Filter", 1,
+										   planstate, es);
+			break;
 		case T_BitmapIndexScan:
 			show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig,
 						   "Index Cond", planstate, ancestors, es);
@@ -2403,6 +2431,21 @@ show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 						 ancestors, es);
 }
 
+/*
+ * Show the sort keys for a BRIN Sort node.
+ */
+static void
+show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSort	   *plan = (BrinSort *) sortstate->ss.ps.plan;
+
+	show_sort_group_keys((PlanState *) sortstate, "Sort Key",
+						 plan->numCols, 0, plan->sortColIdx,
+						 plan->sortOperators, plan->collations,
+						 plan->nullsFirst,
+						 ancestors, es);
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
@@ -3823,6 +3866,7 @@ ExplainTargetRel(Plan *plan, Index rti, ExplainState *es)
 		case T_ForeignScan:
 		case T_CustomScan:
 		case T_ModifyTable:
+		case T_BrinSort:
 			/* Assert it's on a real relation */
 			Assert(rte->rtekind == RTE_RELATION);
 			objectname = get_rel_name(rte->relid);
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 11118d0ce0..bcaa2ce8e2 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -38,6 +38,7 @@ OBJS = \
 	nodeBitmapHeapscan.o \
 	nodeBitmapIndexscan.o \
 	nodeBitmapOr.o \
+	nodeBrinSort.o \
 	nodeCtescan.o \
 	nodeCustom.o \
 	nodeForeignscan.o \
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 4d288bc8d4..93d1007809 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -79,6 +79,7 @@
 #include "executor/nodeBitmapHeapscan.h"
 #include "executor/nodeBitmapIndexscan.h"
 #include "executor/nodeBitmapOr.h"
+#include "executor/nodeBrinSort.h"
 #include "executor/nodeCtescan.h"
 #include "executor/nodeCustom.h"
 #include "executor/nodeForeignscan.h"
@@ -226,6 +227,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 														 estate, eflags);
 			break;
 
+		case T_BrinSort:
+			result = (PlanState *) ExecInitBrinSort((BrinSort *) node,
+													estate, eflags);
+			break;
+
 		case T_BitmapIndexScan:
 			result = (PlanState *) ExecInitBitmapIndexScan((BitmapIndexScan *) node,
 														   estate, eflags);
@@ -639,6 +645,10 @@ ExecEndNode(PlanState *node)
 			ExecEndIndexOnlyScan((IndexOnlyScanState *) node);
 			break;
 
+		case T_BrinSortState:
+			ExecEndBrinSort((BrinSortState *) node);
+			break;
+
 		case T_BitmapIndexScanState:
 			ExecEndBitmapIndexScan((BitmapIndexScanState *) node);
 			break;
diff --git a/src/backend/executor/meson.build b/src/backend/executor/meson.build
index 65f9457c9b..ed7f38a139 100644
--- a/src/backend/executor/meson.build
+++ b/src/backend/executor/meson.build
@@ -26,6 +26,7 @@ backend_sources += files(
   'nodeBitmapHeapscan.c',
   'nodeBitmapIndexscan.c',
   'nodeBitmapOr.c',
+  'nodeBrinSort.c',
   'nodeCtescan.c',
   'nodeCustom.c',
   'nodeForeignscan.c',
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
new file mode 100644
index 0000000000..9505eafc54
--- /dev/null
+++ b/src/backend/executor/nodeBrinSort.c
@@ -0,0 +1,1612 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.c
+ *	  Routines to support sorted scan of relations using a BRIN index
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * The overall algorithm is roughly this:
+ *
+ * 0) initialize a tuplestore and a tuplesort
+ *
+ * 1) fetch list of page ranges from the BRIN index, sorted by minval
+ *    (with the not-summarized ranges first, and all-null ranges last)
+ *
+ * 2) for NULLS FIRST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ * 3) while there are ranges in the list, do this:
+ *
+ *   a) get next (distinct) minval from the list, call it watermark
+ *
+ *   b) if there are any tuples in the tuplestore, move them to tuplesort
+ *
+ *   c) process all ranges with (minval < watermark) - read tuples and feed
+ *      them either into tuplestore (when value < watermark) or tuplestore
+ *
+ *   d) sort the tuplestore, output all the tuples
+ *
+ * 4) if some tuples remain in the tuplestore, sort and output them
+ *
+ * 5) for NULLS LAST ordering, walk all ranges that may contain NULL
+ *    values and output them (and return to the beginning of the list)
+ *
+ *
+ * For DESC orderings the process is almost the same, except that we look
+ * at maxval and use '>' operator (but that's transparent).
+ *
+ * There's a couple possible things that might be done in different ways:
+ *
+ * 1) Not using tuplestore, and feeding tuples only to a tuplesort. Then
+ * while producing the tuples, we'd only output tuples up to the current
+ * watermark, and then we'd keep the remaining tuples for the next round.
+ * Either we'd need to transfer them into a second tuplesort, or allow
+ * "reopening" the tuplesort and adding more tuples. And then only the
+ * part since the watermark would get sorted (possibly using a merge-sort
+ * with the already sorted part).
+ *
+ *
+ * 2) The other question is what to do with NULL values - at the moment we
+ * just read the ranges, output the NULL tuples and that's it - we're not
+ * retaining any non-NULL tuples, so that we'll read the ranges again in
+ * the second range. The logic here is that either there are very few
+ * such ranges, so it's won't cost much to just re-read them. Or maybe
+ * there are very many such ranges, and we'd do a lot of spilling to the
+ * tuplestore, and it's not much more expensive to just re-read the source
+ * data. There are counter-examples, though - e.g., there might be many
+ * has_nulls ranges, but with very few non-NULL tuples. In this case it
+ * might be better to actually spill the tuples instead of re-reading all
+ * the ranges. Maybe this is something we can do at run-time, or maybe we
+ * could estimate this at planning time. We do know the null_frac for the
+ * column, so we know the number of NULL rows. And we also know the number
+ * of all_nulls and has_nulls ranges. We can estimate the number of rows
+ * per range, and we can estimate how many non-NULL rows are in the
+ * has_nulls ranges (we don't need to re-read all-nulls ranges). There's
+ * also the filter, which may reduce the amount of rows to store.
+ *
+ * So we'd need to compare two metrics calculated roughly like this:
+ *
+ *   cost(re-reading has-nulls ranges)
+ *      = cost(random_page_cost * n_has_nulls + seq_page_cost * pages_per_range)
+ *
+ *   cost(spilling non-NULL rows from has-nulls ranges)
+ *      = cost(numrows * width / BLCKSZ * seq_page_cost * 2)
+ *
+ * where numrows is the number of non-NULL rows in has_null ranges, which
+ * can be calculated like this:
+ *
+ *   // estimated number of rows in has-null ranges
+ *   rows_in_has_nulls = (reltuples / relpages) * pages_per_range * n_has_nulls
+ *
+ *   // number of NULL rows in the has-nulls ranges
+ *   nulls_in_ranges = reltuples * null_frac - n_all_nulls * (reltuples / relpages)
+ *
+ *   // numrows is the difference, multiplied by selectivity of the index
+ *   // filter condition (value between 0.0 and 1.0)
+ *   numrows = (rows_in_has_nulls - nulls_in_ranges) * selectivity
+ *
+ * This ignores non-summarized ranges, but there should be only very few of
+ * those, so it should not make a huge difference. Otherwise we can divide
+ * them between regular, has-nulls and all-nulls pages to keep the ratio.
+ *
+ *
+ * 3) How large step to make when updating the watermark?
+ *
+ * When updating the watermark, one option is to simply proceed to the next
+ * distinct minval value, which is the smallest possible step we can make.
+ * This may be both fine and very inefficient, depending on how many rows
+ * end up in the tuplesort and how many rows we end up spilling (possibly
+ * repeatedly to the tuplestore).
+ *
+ * When having to sort large number of rows, it's inefficient to run many
+ * tiny sorts, even if it produces correct result. For example when sorting
+ * 1M rows, we may split this as either (a) 100000x sorts of 10 rows, or
+ * (b) 1000 sorts of 1000 rows. The (b) option is almost certainly more
+ * efficient. Maybe sorts of 10k rows would be even better, if it fits
+ * into work_mem.
+ *
+ * This gets back to how large the page ranges are, and if/how much they
+ * overlap. With tiny ranges (e.g. a single-page ranges), a single range
+ * can only add as many rows as we can fit on a single page. So we need
+ * more ranges by default - how many watermark steps that is depends on
+ * how many distinct minval values are there ...
+ *
+ * Then there's overlaps - if ranges do not overlap, we're done and we'll
+ * add the whole range because the next watermark is above maxval. But
+ * when the ranges overlap, we'll only add the first part (assuming the
+ * minval of the next range is the watermark). Assume 10 overlapping
+ * ranges - imagine for example ranges shifted by 10%, so something like
+ *
+ *   [0,100] [10,110], [20,120], [30, 130], ..., [90, 190]
+ *
+ * In the first step we use watermark=10 and load the first range, with
+ * maybe 1000 rows in total. But assuming uniform distribution, only about
+ * 100 rows will go into the tuplesort, the remaining 900 rows will go into
+ * the tuplestore (assuming uniform distribution). Then in the second step
+ * we sort another 100 rows and the remaining 800 rows will be moved into
+ * a new tuplestore. And so on and so on.
+ *
+ * This means that incrementing the watermarks by single steps may be
+ * quite inefficient, and we need to reflect both the range size and
+ * how much the ranges overlap.
+ *
+ * In fact, maybe we should not determine the step as number of minval
+ * values to skip, but how many ranges would that mean reading. Because
+ * if we have a minval with many duplicates, that may load many rows.
+ * Or even better, we could look at how many rows would that mean loading
+ * into the tuplestore - if we track P(x<minval) for each range (e.g. by
+ * calculating average value during ANALYZE, or perhaps by estimating
+ * it from per-column stats), then we know the increment is going to be
+ * about
+ *
+ *     P(x < minval[i]) - P(x < minval[i-1])
+ *
+ * and we can stop once we'd exceed work_mem (with some slack). See comment
+ * for brin_minmax_stats() for more thoughts.
+ *
+ *
+ * 4) LIMIT/OFFSET vs. full sort
+ *
+ * There's one case where very small sorts may be actually optimal, and
+ * that's queries that need to process only very few rows - say, LIMIT
+ * queries with very small bound.
+ *
+ *
+ * FIXME handling of other brin opclasses (minmax-multi)
+ *
+ * FIXME improve costing
+ *
+ *
+ * Improvement ideas:
+ *
+ * 1) multiple tuplestores for overlapping ranges
+ *
+ * When there are many overlapping ranges (so that maxval > current.maxval),
+ * we're loading all the "future" tuples into a new tuplestore. However, if
+ * there are multiple such ranges (imagine ranges "shifting" by 10%, which
+ * gives us 9 more ranges), we know in the next round we'll only need rows
+ * until the next maxval. We'll not sort these rows, but we'll still shuffle
+ * them around until we get to the proper range (so about 10x each row).
+ * Maybe we should pre-allocate the tuplestores (or maybe even tuplesorts)
+ * for future ranges, and route the tuples to the correct one? Maybe we
+ * could be a bit smarter and discard tuples once we have enough rows for
+ * the preceding ranges (say, with LIMIT queries). We'd also need to worry
+ * about work_mem, though - we can't just use many tuplestores, each with
+ * whole work_mem. So we'd probably use e.g. work_mem/2 for the next one,
+ * and then /4, /8 etc. for the following ones. That's work_mem in total.
+ * And there'd need to be some limit on number of tuplestores, I guess.
+ *
+ * 2) handling NULL values
+ *
+ * We need to handle NULLS FIRST / NULLS LAST cases. The question is how
+ * to do that - the easiest way is to simply do a separate scan of ranges
+ * that might contain NULL values, processing just rows with NULLs, and
+ * discarding other rows. And then process non-NULL values as currently.
+ * The NULL scan would happen before/after this regular phase.
+ *
+ * Byt maybe we could be smarter, and not do separate scans. When reading
+ * a page, we might stash the tuple in a tuplestore, so that we can read
+ * it the next round. Obviously, this might be expensive if we need to
+ * keep too many rows, so the tuplestore would grow too large - in that
+ * case it might be better to just do the two scans.
+ *
+ * 3) parallelism
+ *
+ * Presumably we could do a parallel version of this. The leader or first
+ * worker would prepare the range information, and the workers would then
+ * grab ranges (in a kinda round robin manner), sort them independently,
+ * and then the results would be merged by Gather Merge.
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeBrinSort.c
+ *
+ *-------------------------------------------------------------------------
+ */
+/*
+ * INTERFACE ROUTINES
+ *		ExecBrinSort			scans a relation using an index
+ *		IndexNext				retrieve next tuple using index
+ *		ExecInitBrinSort		creates and initializes state info.
+ *		ExecReScanBrinSort		rescans the indexed relation.
+ *		ExecEndBrinSort			releases all storage.
+ *		ExecBrinSortMarkPos		marks scan position.
+ *		ExecBrinSortRestrPos	restores scan position.
+ *		ExecBrinSortEstimate	estimates DSM space needed for parallel index scan
+ *		ExecBrinSortInitializeDSM initialize DSM for parallel BrinSort
+ *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
+ *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
+ */
+#include "postgres.h"
+
+#include "access/brin.h"
+#include "access/brin_internal.h"
+#include "access/nbtree.h"
+#include "access/relscan.h"
+#include "access/table.h"
+#include "access/tableam.h"
+#include "catalog/index.h"
+#include "catalog/pg_am.h"
+#include "executor/execdebug.h"
+#include "executor/nodeBrinSort.h"
+#include "lib/pairingheap.h"
+#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/array.h"
+#include "utils/datum.h"
+#include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/rel.h"
+
+
+static TupleTableSlot *IndexNext(BrinSortState *node);
+static bool IndexRecheck(BrinSortState *node, TupleTableSlot *slot);
+static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
+
+#ifdef DEBUG_BRIN_SORT
+bool debug_brin_sort = false;
+#endif
+
+/* do various consistency checks */
+static void
+AssertCheckRanges(BrinSortState *node)
+{
+#ifdef USE_ASSERT_CHECKING
+
+#endif
+}
+
+/*
+ * brinsort_start_tidscan
+ *		Start scanning tuples from a given page range.
+ *
+ * We open a TID range scan for the given range, and initialize the tuplesort.
+ * Optionally, we update the watermark (with either high/low value). We only
+ * need to do this for the main page range, not for the intersecting ranges.
+ *
+ * XXX Maybe we should initialize the tidscan only once, and then do rescan
+ * for the following ranges? And similarly for the tuplesort?
+ */
+static void
+brinsort_start_tidscan(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate = node->ss.ps.state;
+	BrinRange  *range = node->bs_range;
+
+	/* There must not be any TID scan in progress yet. */
+	Assert(node->ss.ss_currentScanDesc == NULL);
+
+	/* Initialize the TID range scan, for the provided block range. */
+	if (node->ss.ss_currentScanDesc == NULL)
+	{
+		TableScanDesc		tscandesc;
+		ItemPointerData		mintid,
+							maxtid;
+
+		ItemPointerSetBlockNumber(&mintid, range->blkno_start);
+		ItemPointerSetOffsetNumber(&mintid, 0);
+
+		ItemPointerSetBlockNumber(&maxtid, range->blkno_end);
+		ItemPointerSetOffsetNumber(&maxtid, MaxHeapTuplesPerPage);
+
+		elog(DEBUG1, "loading range blocks [%u, %u]",
+			 range->blkno_start, range->blkno_end);
+
+		tscandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
+											 estate->es_snapshot,
+											 &mintid, &maxtid);
+		node->ss.ss_currentScanDesc = tscandesc;
+	}
+
+	if (node->bs_tuplesortstate == NULL)
+	{
+		TupleDesc	tupDesc = (node->ss.ps.ps_ResultTupleDesc);
+
+		node->bs_tuplesortstate = tuplesort_begin_heap(tupDesc,
+													plan->numCols,
+													plan->sortColIdx,
+													plan->sortOperators,
+													plan->collations,
+													plan->nullsFirst,
+													work_mem,
+													NULL,
+													TUPLESORT_NONE);
+	}
+
+	if (node->bs_tuplestore == NULL)
+	{
+		node->bs_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+	}
+}
+
+/*
+ * brinsort_end_tidscan
+ *		Finish the TID range scan.
+ */
+static void
+brinsort_end_tidscan(BrinSortState *node)
+{
+	/* get the first range, read all tuples using a tid range scan */
+	if (node->ss.ss_currentScanDesc != NULL)
+	{
+		table_endscan(node->ss.ss_currentScanDesc);
+		node->ss.ss_currentScanDesc = NULL;
+	}
+}
+
+/*
+ * brinsort_update_watermark
+ *		Advance the watermark to the next minval (or maxval for DESC).
+ *
+ * We could could actually advance the watermark by multiple steps (not to
+ * the immediately following minval, but a couple more), to accumulate more
+ * rows in the tuplesort. The number of steps we make correlates with the
+ * amount of data we sort in a given step, but we don't know in advance
+ * how many rows (or bytes) will that actually be. We could do some simple
+ * heuristics (measure past sorts and extrapolate).
+ *
+ * XXX With a separate _set and _empty flags, we don't really need to pass
+ * a separate "first" parameter - "set=false" has the same meaning.
+ */
+static void
+brinsort_update_watermark(BrinSortState *node, bool asc)
+{
+	int		cmp;
+	bool	found = false;
+
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool	isnull;
+		Datum	value;
+		bool	all_nulls;
+		bool	not_summarized;
+
+		all_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 4, &isnull));
+		Assert(!isnull);
+
+		not_summarized = DatumGetBool(slot_getattr(node->bs_scan->slot, 5, &isnull));
+		Assert(!isnull);
+
+		/* we ignore ranges that are either all_nulls or not summarized */
+		if (all_nulls || not_summarized)
+			continue;
+
+		/* use either minval or maxval, depending on the ASC / DESC */
+		if (asc)
+			value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		else
+			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		if (!node->bs_watermark_set)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			continue;
+		}
+
+		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
+								  &node->bs_sortsupport);
+
+		if (cmp < 0)
+		{
+			node->bs_watermark_set = true;
+			node->bs_watermark = value;
+			found = true;
+			break;
+		}
+	}
+
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	node->bs_watermark_empty = (!found);
+}
+
+/*
+ * brinsort_load_tuples
+ *		Load tuples from the TID range scan, add them to tuplesort/store.
+ *
+ * When called for the "current" range, we don't need to check the watermark,
+ * we know the tuple goes into the tuplesort. So with check_watermark we
+ * skip the comparator call to save CPU cost.
+ */
+static void
+brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_processing)
+{
+	BrinSort	   *plan = (BrinSort *) node->ss.ps.plan;
+	TableScanDesc	scan;
+	EState		   *estate;
+	ScanDirection	direction;
+	TupleTableSlot *slot;
+	BrinRange	   *range = node->bs_range;
+	ProjectionInfo *projInfo;
+
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+	projInfo = node->bs_ProjInfo;
+
+	slot = node->ss.ss_ScanTupleSlot;
+
+	Assert(node->bs_range != NULL);
+
+	/*
+	 * If we're not processign NULLS, and this is all-nulls range, we can
+	 * just skip it - we won't find any non-NULL tuples in it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (!null_processing && range->all_nulls)
+		return;
+
+	/*
+	 * Similarly, if we're processing NULLs and this range does not have
+	 * has_nulls flag, we can skip it.
+	 *
+	 * XXX Shouldn't happen, thanks to logic in brinsort_next_range().
+	 */
+	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
+		return;
+
+	brinsort_start_tidscan(node);
+
+	scan = node->ss.ss_currentScanDesc;
+
+	/*
+	 * Read tuples, evaluate the filter (so that we don't keep tuples only to
+	 * discard them later), and decide if it goes into the current range
+	 * (tuplesort) or overflow (tuplestore).
+	 */
+	while (table_scan_getnextslot_tidrange(scan, direction, slot))
+	{
+		ExprContext *econtext;
+		ExprState  *qual;
+
+		/*
+		 * Fetch data from node
+		 */
+		qual = node->bs_qual;
+		econtext = node->ss.ps.ps_ExprContext;
+
+		/*
+		 * place the current tuple into the expr context
+		 */
+		econtext->ecxt_scantuple = slot;
+
+		/*
+		 * check that the current tuple satisfies the qual-clause
+		 *
+		 * check for non-null qual here to avoid a function call to ExecQual()
+		 * when the qual is null ... saves only a few cycles, but they add up
+		 * ...
+		 *
+		 * XXX Done here, because in ExecScan we'll get different slot type
+		 * (minimal tuple vs. buffered tuple). Scan expects slot while reading
+		 * from the table (like here), but we're stashing it into a tuplesort.
+		 *
+		 * XXX Maybe we could eliminate many tuples by leveraging the BRIN
+		 * range, by executing the consistent function. But we don't have
+		 * the qual in appropriate format at the moment, so we'd preprocess
+		 * the keys similarly to bringetbitmap(). In which case we should
+		 * probably evaluate the stuff while building the ranges? Although,
+		 * if the "consistent" function is expensive, it might be cheaper
+		 * to do that incrementally, as we need the ranges. Would be a win
+		 * for LIMIT queries, for example.
+		 *
+		 * XXX However, maybe we could also leverage other bitmap indexes,
+		 * particularly for BRIN indexes because that makes it simpler to
+		 * eliminate the ranges incrementally - we know which ranges to
+		 * load from the index, while for other indexes (e.g. btree) we
+		 * have to read the whole index and build a bitmap in order to have
+		 * a bitmap for any range. Although, if the condition is very
+		 * selective, we may need to read only a small fraction of the
+		 * index, so maybe that's OK.
+		 */
+		if (qual == NULL || ExecQual(qual, econtext))
+		{
+			int		cmp = 0;	/* matters for check_watermark=false */
+			Datum	value;
+			bool	isnull;
+			TupleTableSlot *tmpslot;
+
+			if (projInfo)
+				tmpslot = ExecProject(projInfo);
+			else
+				tmpslot = slot;
+
+			value = slot_getattr(tmpslot, plan->sortColIdx[0], &isnull);
+
+			/*
+			 * Handle NULL values - stash them into the tuplestore, and then
+			 * we'll output them in "process" stage.
+			 *
+			 * XXX Can we be a bit smarter for LIMIT queries and stop reading
+			 * rows once we get the number we need to produce? Probably not,
+			 * because the ordering may reference other columns (which we may
+			 * satisfy through IncrementalSort). But all NULL columns are
+			 * considered equal, so we need all the rows to properly compare
+			 * the other keys.
+			 */
+			if (null_processing)
+			{
+				/* Stash it to the tuplestore (when NULL, or ignore
+				 * it (when not-NULL). */
+				if (isnull)
+					tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+
+				/* NULL or not, we're done */
+				continue;
+			}
+
+			/* we're not processing NULL values, so ignore NULLs */
+			if (isnull)
+				continue;
+
+			/*
+			 * Otherwise compare to watermark, and stash it either to the
+			 * tuplesort or tuplestore.
+			 */
+			if (check_watermark && node->bs_watermark_set && !node->bs_watermark_empty)
+				cmp = ApplySortComparator(value, false,
+										  node->bs_watermark, false,
+										  &node->bs_sortsupport);
+
+			if (cmp <= 0)
+				tuplesort_puttupleslot(node->bs_tuplesortstate, tmpslot);
+			else
+			{
+				/*
+				 * XXX We can be a bit smarter for LIMIT queries - once we
+				 * know we have more rows in the tuplesort than we need to
+				 * output, we can stop spilling - those rows are not going
+				 * to be needed. We can discard the tuplesort (no need to
+				 * respill) and stop spilling.
+				 */
+				tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+			}
+		}
+
+		ExecClearTuple(slot);
+	}
+
+	ExecClearTuple(slot);
+
+	brinsort_end_tidscan(node);
+}
+
+/*
+ * brinsort_load_spill_tuples
+ *		Load tuples from the spill tuplestore, and either stash them into
+ *		a tuplesort or a new tuplestore.
+ *
+ * After processing the last range, we want to process all remaining ranges,
+ * so with check_watermark=false we skip the check.
+ */
+static void
+brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	Tuplestorestate *tupstore;
+	TupleTableSlot *slot;
+	ProjectionInfo *projInfo;
+
+	projInfo = node->bs_ProjInfo;
+
+	if (node->bs_tuplestore == NULL)
+		return;
+
+	/* start scanning the existing tuplestore (XXX needed?) */
+	tuplestore_rescan(node->bs_tuplestore);
+
+	/*
+	 * Create a new tuplestore, for tuples that exceed the watermark and so
+	 * should not be included in the current sort.
+	 */
+	tupstore = tuplestore_begin_heap(false, false, work_mem);
+
+	/*
+	 * We need a slot for minimal tuples. The scan slot uses buffered tuples,
+	 * so it'd trigger an error in the loop.
+	 */
+	if (projInfo)
+		slot = node->ss.ps.ps_ResultTupleSlot;
+	else
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(node->ss.ss_currentRelation),
+									&TTSOpsMinimalTuple);
+
+	while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+	{
+		int		cmp = 0;	/* matters for check_watermark=false */
+		bool	isnull;
+		Datum	value;
+
+		value = slot_getattr(slot, plan->sortColIdx[0], &isnull);
+
+		/* We shouldn't have NULL values in the spill, at least not now. */
+		Assert(!isnull);
+
+		if (check_watermark && node->bs_watermark_set && !node->bs_watermark_empty)
+			cmp = ApplySortComparator(value, false,
+									  node->bs_watermark, false,
+									  &node->bs_sortsupport);
+
+		if (cmp <= 0)
+			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+		else
+		{
+			/*
+			 * XXX We can be a bit smarter for LIMIT queries - once we
+			 * know we have more rows in the tuplesort than we need to
+			 * output, we can stop spilling - those rows are not going
+			 * to be needed. We can discard the tuplesort (no need to
+			 * respill) and stop spilling.
+			 */
+			tuplestore_puttupleslot(tupstore, slot);
+		}
+	}
+
+	/*
+	 * Discard the existing tuplestore (that we just processed), use the new
+	 * one instead.
+	 */
+	tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = tupstore;
+
+	if (!projInfo)
+		ExecDropSingleTupleTableSlot(slot);
+}
+
+static bool
+brinsort_next_range(BrinSortState *node, bool asc)
+{
+	/* FIXME free the current bs_range, if any */
+	node->bs_range = NULL;
+
+	/*
+	 * Mark the position, so that we can restore it in case we reach the
+	 * current watermark.
+	 */
+	tuplesort_markpos(node->bs_scan->ranges);
+
+	/*
+	 * Get the next range and return it, unless we can prove it's the last
+	 * range that can possibly match the current conditon (thanks to how we
+	 * order the ranges).
+	 *
+	 * Also skip ranges that can't possibly match (e.g. because we are in
+	 * NULL processing, and the range has no NULLs).
+	 */
+	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
+	{
+		bool		isnull;
+		Datum		value;
+
+		BrinRange  *range = (BrinRange *) palloc(sizeof(BrinRange));
+
+		range->blkno_start = DatumGetUInt32(slot_getattr(node->bs_scan->slot, 1, &isnull));
+		range->blkno_end = DatumGetUInt32(slot_getattr(node->bs_scan->slot, 2, &isnull));
+		range->has_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 3, &isnull));
+		range->all_nulls = DatumGetBool(slot_getattr(node->bs_scan->slot, 4, &isnull));
+		range->not_summarized = DatumGetBool(slot_getattr(node->bs_scan->slot, 5, &isnull));
+		range->min_value = slot_getattr(node->bs_scan->slot, 6, &isnull);
+		range->max_value = slot_getattr(node->bs_scan->slot, 7, &isnull);
+
+		/*
+		 * Not-summarized ranges match irrespectedly of the watermark (if
+		 * it's set at all).
+		 */
+		if (range->not_summarized)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * The range is summarized, but maybe the watermark is not? That
+		 * would mean we're processing NULL values, so we skip ranges that
+		 * can't possibly match (i.e. with all_nulls=has_nulls=false).
+		 */
+		if (!node->bs_watermark_set)
+		{
+			if (range->all_nulls || range->has_nulls)
+			{
+				node->bs_range = range;
+				return true;
+			}
+
+			/* update the position and try the next range */
+			tuplesort_markpos(node->bs_scan->ranges);
+			pfree(range);
+
+			continue;
+		}
+
+		/*
+		 * Watermark is set, but it's empty - everything matches (except
+		 * for NULL-only ranges, because we're definitely not processing
+		 * NULLS, because then we wouldn't have watermark set).
+		 */
+		if (node->bs_watermark_empty)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		/*
+		 * So now we have a summarized range, and we know the watermark
+		 * is set too (so we're not processing NULLs). We place the ranges
+		 * with only nulls last, so once we hit one we're done.
+		 */
+		if (range->all_nulls)
+		{
+			pfree(range);
+			return false;	/* no more matching ranges */
+		}
+
+		/*
+		 * Compare the range to the watermark, using either the minval or
+		 * maxval, depending on ASC/DESC ordering. If the range precedes the
+		 * watermark, return it. Otherwise abort, all the future ranges are
+		 * either not matching the watermark (thanks to ordering) or contain
+		 * only NULL values.
+		 */
+
+		/* use minval or maxval, depending on ASC / DESC */
+		value = (asc) ? range->min_value : range->max_value;
+
+		/*
+		 * compare it to the current watermark (if set)
+		 *
+		 * XXX We don't use (... <= 0) here, because then we'd load ranges
+		 * with that minval (and there might be multiple), but most of the
+		 * rows would go into the tuplestore, because only rows matching the
+		 * minval exactly would be loaded into tuplesort.
+		 */
+		if (ApplySortComparator(value, false,
+								 node->bs_watermark, false,
+								 &node->bs_sortsupport) < 0)
+		{
+			node->bs_range = range;
+			return true;
+		}
+
+		pfree(range);
+		break;
+	}
+
+	/* not a matching range, we're done */
+	tuplesort_restorepos(node->bs_scan->ranges);
+
+	return false;
+}
+
+static bool
+brinsort_range_with_nulls(BrinSortState *node)
+{
+	BrinRange *range = node->bs_range;
+
+	if (range->all_nulls || range->has_nulls || range->not_summarized)
+		return true;
+
+	return false;
+}
+
+static void
+brinsort_rescan(BrinSortState *node)
+{
+	tuplesort_rescan(node->bs_scan->ranges);
+}
+
+/* ----------------------------------------------------------------
+ *		IndexNext
+ *
+ *		Retrieve a tuple from the BrinSort node's currentRelation
+ *		using the index specified in the BrinSortState information.
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+IndexNext(BrinSortState *node)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	EState	   *estate;
+	ScanDirection direction;
+	IndexScanDesc scandesc;
+	TupleTableSlot *slot;
+	bool		nullsFirst;
+	bool		asc;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	estate = node->ss.ps.state;
+	direction = estate->es_direction;
+
+	/* flip direction if this is an overall backward scan */
+	/* XXX For BRIN indexes this is always forward direction */
+	// if (ScanDirectionIsBackward(((BrinSort *) node->ss.ps.plan)->indexorderdir))
+	if (false)
+	{
+		if (ScanDirectionIsForward(direction))
+			direction = BackwardScanDirection;
+		else if (ScanDirectionIsBackward(direction))
+			direction = ForwardScanDirection;
+	}
+	scandesc = node->iss_ScanDesc;
+	slot = node->ss.ss_ScanTupleSlot;
+
+	nullsFirst = plan->nullsFirst[0];
+	asc = ScanDirectionIsForward(plan->indexorderdir);
+
+	if (scandesc == NULL)
+	{
+		/*
+		 * We reach here if the index scan is not parallel, or if we're
+		 * serially executing an index scan that was planned to be parallel.
+		 */
+		scandesc = index_beginscan(node->ss.ss_currentRelation,
+								   node->iss_RelationDesc,
+								   estate->es_snapshot,
+								   node->iss_NumScanKeys,
+								   node->iss_NumOrderByKeys);
+
+		node->iss_ScanDesc = scandesc;
+
+		/*
+		 * If no run-time keys to calculate or they are ready, go ahead and
+		 * pass the scankeys to the index AM.
+		 */
+		if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+			index_rescan(scandesc,
+						 node->iss_ScanKeys, node->iss_NumScanKeys,
+						 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+
+		/*
+		 * Load info about BRIN ranges, sort them to match the desired ordering.
+		 */
+		ExecInitBrinSortRanges(plan, node);
+		node->bs_phase = BRINSORT_START;
+	}
+
+	/*
+	 * ok, now that we have what we need, fetch the next tuple.
+	 */
+	while (node->bs_phase != BRINSORT_FINISHED)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		elog(DEBUG1, "phase = %d", node->bs_phase);
+
+		AssertCheckRanges(node);
+
+		switch (node->bs_phase)
+		{
+			case BRINSORT_START:
+
+				elog(DEBUG1, "phase = START");
+
+				/*
+				 * If we have NULLS FIRST, move to that stage. Otherwise
+				 * start scanning regular ranges.
+				 */
+				if (nullsFirst)
+					node->bs_phase = BRINSORT_LOAD_NULLS;
+				else
+				{
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+
+					/* set the first watermark */
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_RANGE:
+				{
+					elog(DEBUG1, "phase = LOAD_RANGE");
+
+					/*
+					 * Load tuples matching the new watermark from the existing
+					 * spill tuplestore. We do this before loading tuples from
+					 * the next chunk of ranges, because those will add tuples
+					 * to the spill, and we'd end up processing those twice.
+					 */
+					brinsort_load_spill_tuples(node, true);
+
+					/*
+					 * Load tuples from ranges, until we find a range that has
+					 * min_value >= watermark.
+					 *
+					 * XXX In fact, we are guaranteed to find an exact match
+					 * for the watermark, because of how we pick the watermark.
+					 */
+					while (brinsort_next_range(node, asc))
+						brinsort_load_tuples(node, true, false);
+
+					/*
+					 * If we have loaded any tuples into the tuplesort, try
+					 * sorting it and move to producing the tuples.
+					 *
+					 * XXX The range might have no rows matching the current
+					 * watermark, in which case the tuplesort is empty.
+					 */
+					if (node->bs_tuplesortstate)
+					{
+#ifdef DEBUG_BRIN_SORT
+						tuplesort_reset_stats(node->bs_tuplesortstate);
+#endif
+
+						tuplesort_performsort(node->bs_tuplesortstate);
+
+#ifdef DEBUG_BRIN_SORT
+						if (debug_brin_sort)
+						{
+							TuplesortInstrumentation stats;
+
+							memset(&stats, 0, sizeof(TuplesortInstrumentation));
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+							elog(WARNING, "method: %s  space: %ld kB (%s)",
+								 tuplesort_method_name(stats.sortMethod),
+								 stats.spaceUsed,
+								 tuplesort_space_type_name(stats.spaceType));
+						}
+#endif
+					}
+
+					node->bs_phase = BRINSORT_PROCESS_RANGE;
+					break;
+				}
+
+			case BRINSORT_PROCESS_RANGE:
+
+				elog(DEBUG1, "phase BRINSORT_PROCESS_RANGE");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplesortstate != NULL)
+				{
+					if (tuplesort_gettupleslot(node->bs_tuplesortstate,
+										ScanDirectionIsForward(direction),
+										false, slot, NULL))
+						return slot;
+
+					/* once we're done with the tuplesort, reset it */
+					tuplesort_reset(node->bs_tuplesortstate);
+				}
+
+				/*
+				 * Now that we processed tuples from the last range batch,
+				 * see if we reached the end of if we should try updating
+				 * the watermark once again. If the watermark is not set,
+				 * we've already processed the last range.
+				 */
+				if (node->bs_watermark_empty)
+				{
+					if (nullsFirst)
+						node->bs_phase = BRINSORT_FINISHED;
+					else
+					{
+						brinsort_rescan(node);
+						node->bs_phase = BRINSORT_LOAD_NULLS;
+						node->bs_watermark_set = false;
+						node->bs_watermark_empty = false;
+					}
+				}
+				else
+				{
+					/* updte the watermark and try reading more ranges */
+					node->bs_phase = BRINSORT_LOAD_RANGE;
+					brinsort_update_watermark(node, asc);
+				}
+
+				break;
+
+			case BRINSORT_LOAD_NULLS:
+				{
+					elog(DEBUG1, "phase = LOAD_NULLS");
+
+					/*
+					 * Try loading another range. If there are no more ranges,
+					 * we're done and we move either to loading regular ranges.
+					 * Otherwise check if this range can contain NULL values.
+					 * If yes, process the range. If not, try loading another
+					 * one from the list.
+					 */
+					while (true)
+					{
+						/* no more ranges - terminate or load regular ranges */
+						if (!brinsort_next_range(node, asc))
+						{
+							if (nullsFirst)
+							{
+								brinsort_rescan(node);
+								node->bs_phase = BRINSORT_LOAD_RANGE;
+								brinsort_update_watermark(node, asc);
+							}
+							else
+								node->bs_phase = BRINSORT_FINISHED;
+
+							break;
+						}
+
+						/* If this range (may) have nulls, proces them */
+						if (brinsort_range_with_nulls(node))
+							break;
+					}
+
+					if (node->bs_range == NULL)
+						break;
+
+					/*
+					 * There should be nothing left in the tuplestore, because
+					 * we flush that at the end of processing regular tuples,
+					 * and we don't retain tuples between NULL ranges.
+					 */
+					// Assert(node->bs_tuplestore == NULL);
+
+					/*
+					 * Load the next unprocessed / NULL range. We don't need to
+					 * check watermark while processing NULLS.
+					 */
+					brinsort_load_tuples(node, false, true);
+
+					node->bs_phase = BRINSORT_PROCESS_NULLS;
+					break;
+				}
+
+				break;
+
+			case BRINSORT_PROCESS_NULLS:
+
+				elog(DEBUG1, "phase = LOAD_NULLS");
+
+				slot = node->ss.ps.ps_ResultTupleSlot;
+
+				Assert(node->bs_tuplestore != NULL);
+
+				/* read tuples from the tuplesort range, and output them */
+				if (node->bs_tuplestore != NULL)
+				{
+
+					while (tuplestore_gettupleslot(node->bs_tuplestore, true, true, slot))
+						return slot;
+
+					tuplestore_end(node->bs_tuplestore);
+					node->bs_tuplestore = NULL;
+
+					node->bs_phase = BRINSORT_LOAD_NULLS;	/* load next range */
+				}
+
+				break;
+
+			case BRINSORT_FINISHED:
+				elog(ERROR, "unexpected BrinSort phase: FINISHED");
+				break;
+		}
+	}
+
+	/*
+	 * if we get here it means the index scan failed so we are at the end of
+	 * the scan..
+	 */
+	node->iss_ReachedEnd = true;
+	return ExecClearTuple(slot);
+}
+
+/*
+ * IndexRecheck -- access method routine to recheck a tuple in EvalPlanQual
+ */
+static bool
+IndexRecheck(BrinSortState *node, TupleTableSlot *slot)
+{
+	ExprContext *econtext;
+
+	/*
+	 * extract necessary information from index scan node
+	 */
+	econtext = node->ss.ps.ps_ExprContext;
+
+	/* Does the tuple meet the indexqual condition? */
+	econtext->ecxt_scantuple = slot;
+	return ExecQualAndReset(node->indexqualorig, econtext);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSort(node)
+ * ----------------------------------------------------------------
+ */
+static TupleTableSlot *
+ExecBrinSort(PlanState *pstate)
+{
+	BrinSortState *node = castNode(BrinSortState, pstate);
+
+	/*
+	 * If we have runtime keys and they've not already been set up, do it now.
+	 */
+	if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady)
+		ExecReScan((PlanState *) node);
+
+	return ExecScan(&node->ss,
+					(ExecScanAccessMtd) IndexNext,
+					(ExecScanRecheckMtd) IndexRecheck);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecReScanBrinSort(node)
+ *
+ *		Recalculates the values of any scan keys whose value depends on
+ *		information known at runtime, then rescans the indexed relation.
+ *
+ * ----------------------------------------------------------------
+ */
+void
+ExecReScanBrinSort(BrinSortState *node)
+{
+	/*
+	 * If we are doing runtime key calculations (ie, any of the index key
+	 * values weren't simple Consts), compute the new key values.  But first,
+	 * reset the context so we don't leak memory as each outer tuple is
+	 * scanned.  Note this assumes that we will recalculate *all* runtime keys
+	 * on each call.
+	 */
+	if (node->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *econtext = node->iss_RuntimeContext;
+
+		ResetExprContext(econtext);
+		ExecIndexEvalRuntimeKeys(econtext,
+								 node->iss_RuntimeKeys,
+								 node->iss_NumRuntimeKeys);
+	}
+	node->iss_RuntimeKeysReady = true;
+
+	/* reset index scan */
+	if (node->iss_ScanDesc)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+	node->iss_ReachedEnd = false;
+
+	ExecScanReScan(&node->ss);
+}
+
+
+/* ----------------------------------------------------------------
+ *		ExecEndBrinSort
+ * ----------------------------------------------------------------
+ */
+void
+ExecEndBrinSort(BrinSortState *node)
+{
+	Relation	indexRelationDesc;
+	IndexScanDesc IndexScanDesc;
+
+	/*
+	 * extract information from the node
+	 */
+	indexRelationDesc = node->iss_RelationDesc;
+	IndexScanDesc = node->iss_ScanDesc;
+
+	/*
+	 * clear out tuple table slots
+	 */
+	if (node->ss.ps.ps_ResultTupleSlot)
+		ExecClearTuple(node->ss.ps.ps_ResultTupleSlot);
+	ExecClearTuple(node->ss.ss_ScanTupleSlot);
+
+	/*
+	 * close the index relation (no-op if we didn't open it)
+	 */
+	if (IndexScanDesc)
+		index_endscan(IndexScanDesc);
+	if (indexRelationDesc)
+		index_close(indexRelationDesc, NoLock);
+
+	if (node->ss.ss_currentScanDesc != NULL)
+		table_endscan(node->ss.ss_currentScanDesc);
+
+	if (node->bs_tuplestore != NULL)
+		tuplestore_end(node->bs_tuplestore);
+	node->bs_tuplestore = NULL;
+
+	if (node->bs_tuplesortstate != NULL)
+		tuplesort_end(node->bs_tuplesortstate);
+	node->bs_tuplesortstate = NULL;
+
+	if (node->bs_scan->ranges != NULL)
+		tuplesort_end(node->bs_scan->ranges);
+	node->bs_scan->ranges = NULL;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortMarkPos
+ *
+ * Note: we assume that no caller attempts to set a mark before having read
+ * at least one tuple.  Otherwise, iss_ScanDesc might still be NULL.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortMarkPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (epqstate != NULL)
+	{
+		/*
+		 * We are inside an EvalPlanQual recheck.  If a test tuple exists for
+		 * this relation, then we shouldn't access the index at all.  We would
+		 * instead need to save, and later restore, the state of the
+		 * relsubs_done flag, so that re-fetching the test tuple is possible.
+		 * However, given the assumption that no caller sets a mark at the
+		 * start of the scan, we can only get here with relsubs_done[i]
+		 * already set, and so no state need be saved.
+		 */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortMarkPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_markpos(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecIndexRestrPos
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortRestrPos(BrinSortState *node)
+{
+	EState	   *estate = node->ss.ps.state;
+	EPQState   *epqstate = estate->es_epq_active;
+
+	if (estate->es_epq_active != NULL)
+	{
+		/* See comments in ExecIndexMarkPos */
+		Index		scanrelid = ((Scan *) node->ss.ps.plan)->scanrelid;
+
+		Assert(scanrelid > 0);
+		if (epqstate->relsubs_slot[scanrelid - 1] != NULL ||
+			epqstate->relsubs_rowmark[scanrelid - 1] != NULL)
+		{
+			/* Verify the claim above */
+			if (!epqstate->relsubs_done[scanrelid - 1])
+				elog(ERROR, "unexpected ExecBrinSortRestrPos call in EPQ recheck");
+			return;
+		}
+	}
+
+	index_restrpos(node->iss_ScanDesc);
+}
+
+/*
+ * somewhat crippled verson of bringetbitmap
+ *
+ * XXX We don't call consistent function (or any other function), so unlike
+ * bringetbitmap we don't set a separate memory context. If we end up filtering
+ * the ranges somehow (e.g. by WHERE conditions), this might be necessary.
+ *
+ * XXX Should be part of opclass, to somewhere in brin_minmax.c etc.
+ */
+static void
+ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
+{
+	IndexScanDesc	scan = planstate->iss_ScanDesc;
+	Relation	indexRel = planstate->iss_RelationDesc;
+	int			attno;
+	FmgrInfo   *rangeproc;
+	BrinRangeScanDesc *brscan;
+	bool		asc;
+
+	/* BRIN Sort only allows ORDER BY using a single column */
+	Assert(node->numCols == 1);
+
+	attno = node->attnums[0];
+
+	/*
+	 * Make sure we matched the sort key - if not, we should not have got
+	 * to this place at all (try sorting using this index).
+	 */
+	Assert(AttrNumberIsForUserDefinedAttr(attno));
+
+	/*
+	 * get procedure to generate sort ranges
+	 *
+	 * FIXME we can't rely on a particular procnum to identify which opclass
+	 * allows building sort ranges, because the optinal procnums are not
+	 * unique (e.g. inclusion_ops have 12 too). So we probably need a flag
+	 * for the opclass.
+	 */
+	rangeproc = index_getprocinfo(indexRel, attno, BRIN_PROCNUM_RANGES);
+
+	/*
+	 * Should not get here without a proc, thanks to the check before
+	 * building the BrinSort path.
+	 */
+	Assert(OidIsValid(rangeproc->fn_oid));
+
+	memset(&planstate->bs_sortsupport, 0, sizeof(SortSupportData));
+
+	planstate->bs_sortsupport.ssup_collation = node->collations[0];
+	planstate->bs_sortsupport.ssup_cxt = CurrentMemoryContext; // FIXME
+
+	PrepareSortSupportFromOrderingOp(node->sortOperators[0], &planstate->bs_sortsupport);
+
+	/*
+	 * Determine if this ASC or DESC sort, so that we can request the
+	 * ranges in the appropriate order (ordered either by minval for
+	 * ASC, or by maxval for DESC).
+	 */
+	asc = ScanDirectionIsForward(node->indexorderdir);
+
+	/*
+	 * Ask the opclass to produce ranges in appropriate ordering.
+	 *
+	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
+	 */
+	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
+											node->collations[0],
+											PointerGetDatum(scan),
+											Int16GetDatum(attno),
+											BoolGetDatum(asc)));
+
+	/* allocate for space, and also for the alternative ordering */
+	planstate->bs_scan = brscan;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecInitBrinSort
+ *
+ *		Initializes the index scan's state information, creates
+ *		scan keys, and opens the base and index relations.
+ *
+ *		Note: index scans have 2 sets of state information because
+ *			  we have to keep track of the base relation and the
+ *			  index relation.
+ * ----------------------------------------------------------------
+ */
+BrinSortState *
+ExecInitBrinSort(BrinSort *node, EState *estate, int eflags)
+{
+	BrinSortState *indexstate;
+	Relation	currentRelation;
+	LOCKMODE	lockmode;
+
+	/*
+	 * create state structure
+	 */
+	indexstate = makeNode(BrinSortState);
+	indexstate->ss.ps.plan = (Plan *) node;
+	indexstate->ss.ps.state = estate;
+	indexstate->ss.ps.ExecProcNode = ExecBrinSort;
+
+	/*
+	 * Miscellaneous initialization
+	 *
+	 * create expression context for node
+	 */
+	ExecAssignExprContext(estate, &indexstate->ss.ps);
+
+	/*
+	 * open the scan relation
+	 */
+	currentRelation = ExecOpenScanRelation(estate, node->scan.scanrelid, eflags);
+
+	indexstate->ss.ss_currentRelation = currentRelation;
+	indexstate->ss.ss_currentScanDesc = NULL;	/* no heap scan here */
+
+	/*
+	 * get the scan type from the relation descriptor.
+	 */
+	ExecInitScanTupleSlot(estate, &indexstate->ss,
+						  RelationGetDescr(currentRelation),
+						  table_slot_callbacks(currentRelation));
+
+	/*
+	 * Initialize result type and projection.
+	 */
+	ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+	// ExecInitResultTypeTL(&indexstate->ss.ps);
+	// ExecAssignScanProjectionInfo(&indexstate->ss);
+	// ExecInitResultSlot(&indexstate->ss.ps, &TTSOpsVirtual);
+
+	indexstate->bs_ProjInfo = ExecBuildProjectionInfo(((Plan *) node)->targetlist,
+													  indexstate->ss.ps.ps_ExprContext,
+													  indexstate->ss.ps.ps_ResultTupleSlot,
+													  &indexstate->ss.ps,
+													  indexstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor);
+
+	/*
+	 * initialize child expressions
+	 *
+	 * Note: we don't initialize all of the indexqual expression, only the
+	 * sub-parts corresponding to runtime keys (see below).  Likewise for
+	 * indexorderby, if any.  But the indexqualorig expression is always
+	 * initialized even though it will only be used in some uncommon cases ---
+	 * would be nice to improve that.  (Problem is that any SubPlans present
+	 * in the expression must be found now...)
+	 */
+	indexstate->ss.ps.qual =
+		ExecInitQual(node->scan.plan.qual, (PlanState *) indexstate);
+	indexstate->indexqualorig =
+		ExecInitQual(node->indexqualorig, (PlanState *) indexstate);
+
+	/*
+	 * If we are just doing EXPLAIN (ie, aren't going to run the plan), stop
+	 * here.  This allows an index-advisor plugin to EXPLAIN a plan containing
+	 * references to nonexistent indexes.
+	 */
+	if (eflags & EXEC_FLAG_EXPLAIN_ONLY)
+		return indexstate;
+
+	/* Open the index relation. */
+	lockmode = exec_rt_fetch(node->scan.scanrelid, estate)->rellockmode;
+	indexstate->iss_RelationDesc = index_open(node->indexid, lockmode);
+
+	/*
+	 * Initialize index-specific scan state
+	 */
+	indexstate->iss_RuntimeKeysReady = false;
+	indexstate->iss_RuntimeKeys = NULL;
+	indexstate->iss_NumRuntimeKeys = 0;
+
+	/*
+	 * build the index scan keys from the index qualification
+	 */
+	ExecIndexBuildScanKeys((PlanState *) indexstate,
+						   indexstate->iss_RelationDesc,
+						   node->indexqual,
+						   false,
+						   &indexstate->iss_ScanKeys,
+						   &indexstate->iss_NumScanKeys,
+						   &indexstate->iss_RuntimeKeys,
+						   &indexstate->iss_NumRuntimeKeys,
+						   NULL,	/* no ArrayKeys */
+						   NULL);
+
+	/*
+	 * If we have runtime keys, we need an ExprContext to evaluate them. The
+	 * node's standard context won't do because we want to reset that context
+	 * for every tuple.  So, build another context just like the other one...
+	 * -tgl 7/11/00
+	 */
+	if (indexstate->iss_NumRuntimeKeys != 0)
+	{
+		ExprContext *stdecontext = indexstate->ss.ps.ps_ExprContext;
+
+		ExecAssignExprContext(estate, &indexstate->ss.ps);
+		indexstate->iss_RuntimeContext = indexstate->ss.ps.ps_ExprContext;
+		indexstate->ss.ps.ps_ExprContext = stdecontext;
+	}
+	else
+	{
+		indexstate->iss_RuntimeContext = NULL;
+	}
+
+	indexstate->bs_tuplesortstate = NULL;
+	indexstate->bs_qual = indexstate->ss.ps.qual;
+	indexstate->ss.ps.qual = NULL;
+	// ExecInitResultTupleSlotTL(&indexstate->ss.ps, &TTSOpsMinimalTuple);
+
+	/*
+	 * all done.
+	 */
+	return indexstate;
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Scan Support
+ * ----------------------------------------------------------------
+ */
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortEstimate
+ *
+ *		Compute the amount of space we'll need in the parallel
+ *		query DSM, and inform pcxt->estimator about our needs.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortEstimate(BrinSortState *node,
+					  ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+
+	node->iss_PscanLen = index_parallelscan_estimate(node->iss_RelationDesc,
+													 estate->es_snapshot);
+	shm_toc_estimate_chunk(&pcxt->estimator, node->iss_PscanLen);
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeDSM
+ *
+ *		Set up a parallel index scan descriptor.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeDSM(BrinSortState *node,
+						   ParallelContext *pcxt)
+{
+	EState	   *estate = node->ss.ps.state;
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_allocate(pcxt->toc, node->iss_PscanLen);
+	index_parallelscan_initialize(node->ss.ss_currentRelation,
+								  node->iss_RelationDesc,
+								  estate->es_snapshot,
+								  piscan);
+	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, piscan);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortReInitializeDSM
+ *
+ *		Reset shared state before beginning a fresh scan.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortReInitializeDSM(BrinSortState *node,
+							 ParallelContext *pcxt)
+{
+	index_parallelrescan(node->iss_ScanDesc);
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBrinSortInitializeWorker
+ *
+ *		Copy relevant information from TOC into planstate.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBrinSortInitializeWorker(BrinSortState *node,
+							  ParallelWorkerContext *pwcxt)
+{
+	ParallelIndexScanDesc piscan;
+
+	piscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+	node->iss_ScanDesc =
+		index_beginscan_parallel(node->ss.ss_currentRelation,
+								 node->iss_RelationDesc,
+								 node->iss_NumScanKeys,
+								 node->iss_NumOrderByKeys,
+								 piscan);
+
+	/*
+	 * If no run-time keys to calculate or they are ready, go ahead and pass
+	 * the scankeys to the index AM.
+	 */
+	if (node->iss_NumRuntimeKeys == 0 || node->iss_RuntimeKeysReady)
+		index_rescan(node->iss_ScanDesc,
+					 node->iss_ScanKeys, node->iss_NumScanKeys,
+					 node->iss_OrderByKeys, node->iss_NumOrderByKeys);
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ef475d95a1..94b5c8df81 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -792,6 +792,260 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	path->path.total_cost = startup_cost + run_cost;
 }
 
+void
+cost_brinsort(BrinSortPath *path, PlannerInfo *root, double loop_count,
+		   bool partial_path)
+{
+	IndexOptInfo *index = path->ipath.indexinfo;
+	RelOptInfo *baserel = index->rel;
+	amcostestimate_function amcostestimate;
+	List	   *qpquals;
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+	Cost		cpu_run_cost = 0;
+	Cost		indexStartupCost;
+	Cost		indexTotalCost;
+	Selectivity indexSelectivity;
+	double		indexCorrelation,
+				csquared;
+	double		spc_seq_page_cost,
+				spc_random_page_cost;
+	Cost		min_IO_cost,
+				max_IO_cost;
+	QualCost	qpqual_cost;
+	Cost		cpu_per_tuple;
+	double		tuples_fetched;
+	double		pages_fetched;
+	double		rand_heap_pages;
+	double		index_pages;
+
+	/* Should only be applied to base relations */
+	Assert(IsA(baserel, RelOptInfo) &&
+		   IsA(index, IndexOptInfo));
+	Assert(baserel->relid > 0);
+	Assert(baserel->rtekind == RTE_RELATION);
+
+	/*
+	 * Mark the path with the correct row estimate, and identify which quals
+	 * will need to be enforced as qpquals.  We need not check any quals that
+	 * are implied by the index's predicate, so we can use indrestrictinfo not
+	 * baserestrictinfo as the list of relevant restriction clauses for the
+	 * rel.
+	 */
+	if (path->ipath.path.param_info)
+	{
+		path->ipath.path.rows = path->ipath.path.param_info->ppi_rows;
+		/* qpquals come from the rel's restriction clauses and ppi_clauses */
+		qpquals = list_concat(extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+														  path->ipath.indexclauses),
+							  extract_nonindex_conditions(path->ipath.path.param_info->ppi_clauses,
+														  path->ipath.indexclauses));
+	}
+	else
+	{
+		path->ipath.path.rows = baserel->rows;
+		/* qpquals come from just the rel's restriction clauses */
+		qpquals = extract_nonindex_conditions(path->ipath.indexinfo->indrestrictinfo,
+											  path->ipath.indexclauses);
+	}
+
+	if (!enable_indexscan)
+		startup_cost += disable_cost;
+	/* we don't need to check enable_indexonlyscan; indxpath.c does that */
+
+	/*
+	 * Call index-access-method-specific code to estimate the processing cost
+	 * for scanning the index, as well as the selectivity of the index (ie,
+	 * the fraction of main-table tuples we will have to retrieve) and its
+	 * correlation to the main-table tuple order.  We need a cast here because
+	 * pathnodes.h uses a weak function type to avoid including amapi.h.
+	 */
+	amcostestimate = (amcostestimate_function) index->amcostestimate;
+	amcostestimate(root, &path->ipath, loop_count,
+				   &indexStartupCost, &indexTotalCost,
+				   &indexSelectivity, &indexCorrelation,
+				   &index_pages);
+
+	/*
+	 * Save amcostestimate's results for possible use in bitmap scan planning.
+	 * We don't bother to save indexStartupCost or indexCorrelation, because a
+	 * bitmap scan doesn't care about either.
+	 */
+	path->ipath.indextotalcost = indexTotalCost;
+	path->ipath.indexselectivity = indexSelectivity;
+
+	/* all costs for touching index itself included here */
+	startup_cost += indexStartupCost;
+	run_cost += indexTotalCost - indexStartupCost;
+
+	/* estimate number of main-table tuples fetched */
+	tuples_fetched = clamp_row_est(indexSelectivity * baserel->tuples);
+
+	/* fetch estimated page costs for tablespace containing table */
+	get_tablespace_page_costs(baserel->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/*----------
+	 * Estimate number of main-table pages fetched, and compute I/O cost.
+	 *
+	 * When the index ordering is uncorrelated with the table ordering,
+	 * we use an approximation proposed by Mackert and Lohman (see
+	 * index_pages_fetched() for details) to compute the number of pages
+	 * fetched, and then charge spc_random_page_cost per page fetched.
+	 *
+	 * When the index ordering is exactly correlated with the table ordering
+	 * (just after a CLUSTER, for example), the number of pages fetched should
+	 * be exactly selectivity * table_size.  What's more, all but the first
+	 * will be sequential fetches, not the random fetches that occur in the
+	 * uncorrelated case.  So if the number of pages is more than 1, we
+	 * ought to charge
+	 *		spc_random_page_cost + (pages_fetched - 1) * spc_seq_page_cost
+	 * For partially-correlated indexes, we ought to charge somewhere between
+	 * these two estimates.  We currently interpolate linearly between the
+	 * estimates based on the correlation squared (XXX is that appropriate?).
+	 *
+	 * If it's an index-only scan, then we will not need to fetch any heap
+	 * pages for which the visibility map shows all tuples are visible.
+	 * Hence, reduce the estimated number of heap fetches accordingly.
+	 * We use the measured fraction of the entire heap that is all-visible,
+	 * which might not be particularly relevant to the subset of the heap
+	 * that this query will fetch; but it's not clear how to do better.
+	 *----------
+	 */
+	if (loop_count > 1)
+	{
+		/*
+		 * For repeated indexscans, the appropriate estimate for the
+		 * uncorrelated case is to scale up the number of tuples fetched in
+		 * the Mackert and Lohman formula by the number of scans, so that we
+		 * estimate the number of pages fetched by all the scans; then
+		 * pro-rate the costs for one scan.  In this case we assume all the
+		 * fetches are random accesses.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+
+		/*
+		 * In the perfectly correlated case, the number of pages touched by
+		 * each scan is selectivity * table_size, and we can use the Mackert
+		 * and Lohman formula at the page level to estimate how much work is
+		 * saved by caching across scans.  We still assume all the fetches are
+		 * random, though, which is an overestimate that's hard to correct for
+		 * without double-counting the cache effects.  (But in most cases
+		 * where such a plan is actually interesting, only one page would get
+		 * fetched per scan anyway, so it shouldn't matter much.)
+		 */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		pages_fetched = index_pages_fetched(pages_fetched * loop_count,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		min_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;
+	}
+	else
+	{
+		/*
+		 * Normal case: apply the Mackert and Lohman formula, and then
+		 * interpolate between that and the correlation-derived result.
+		 */
+		pages_fetched = index_pages_fetched(tuples_fetched,
+											baserel->pages,
+											(double) index->pages,
+											root);
+
+		rand_heap_pages = pages_fetched;
+
+		/* max_IO_cost is for the perfectly uncorrelated case (csquared=0) */
+		max_IO_cost = pages_fetched * spc_random_page_cost;
+
+		/* min_IO_cost is for the perfectly correlated case (csquared=1) */
+		pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
+
+		if (pages_fetched > 0)
+		{
+			min_IO_cost = spc_random_page_cost;
+			if (pages_fetched > 1)
+				min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
+		}
+		else
+			min_IO_cost = 0;
+	}
+
+	if (partial_path)
+	{
+		/*
+		 * Estimate the number of parallel workers required to scan index. Use
+		 * the number of heap pages computed considering heap fetches won't be
+		 * sequential as for parallel scans the pages are accessed in random
+		 * order.
+		 */
+		path->ipath.path.parallel_workers = compute_parallel_worker(baserel,
+															  rand_heap_pages,
+															  index_pages,
+															  max_parallel_workers_per_gather);
+
+		/*
+		 * Fall out if workers can't be assigned for parallel scan, because in
+		 * such a case this path will be rejected.  So there is no benefit in
+		 * doing extra computation.
+		 */
+		if (path->ipath.path.parallel_workers <= 0)
+			return;
+
+		path->ipath.path.parallel_aware = true;
+	}
+
+	/*
+	 * Now interpolate based on estimated index order correlation to get total
+	 * disk I/O cost for main table accesses.
+	 */
+	csquared = indexCorrelation * indexCorrelation;
+
+	run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
+
+	/*
+	 * Estimate CPU costs per tuple.
+	 *
+	 * What we want here is cpu_tuple_cost plus the evaluation costs of any
+	 * qual clauses that we have to evaluate as qpquals.
+	 */
+	cost_qual_eval(&qpqual_cost, qpquals, root);
+
+	startup_cost += qpqual_cost.startup;
+	cpu_per_tuple = cpu_tuple_cost + qpqual_cost.per_tuple;
+
+	cpu_run_cost += cpu_per_tuple * tuples_fetched;
+
+	/* tlist eval costs are paid per output row, not per tuple scanned */
+	startup_cost += path->ipath.path.pathtarget->cost.startup;
+	cpu_run_cost += path->ipath.path.pathtarget->cost.per_tuple * path->ipath.path.rows;
+
+	/* Adjust costing for parallelism, if used. */
+	if (path->ipath.path.parallel_workers > 0)
+	{
+		double		parallel_divisor = get_parallel_divisor(&path->ipath.path);
+
+		path->ipath.path.rows = clamp_row_est(path->ipath.path.rows / parallel_divisor);
+
+		/* The CPU cost is divided among all the workers. */
+		cpu_run_cost /= parallel_divisor;
+	}
+
+	run_cost += cpu_run_cost;
+
+	path->ipath.path.startup_cost = startup_cost;
+	path->ipath.path.total_cost = startup_cost + run_cost;
+}
+
 /*
  * extract_nonindex_conditions
  *
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 6a93d767a5..35679a25ed 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -17,12 +17,16 @@
 
 #include <math.h>
 
+#include "access/brin_internal.h"
+#include "access/relation.h"
 #include "access/stratnum.h"
 #include "access/sysattr.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_type.h"
+#include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/supportnodes.h"
@@ -32,10 +36,13 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/rel.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
 
+bool		enable_brinsort = true;
+
 /* XXX see PartCollMatchesExprColl */
 #define IndexCollMatchesExprColl(idxcollation, exprcollation) \
 	((idxcollation) == InvalidOid || (idxcollation) == (exprcollation))
@@ -1106,6 +1113,182 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 		}
 	}
 
+	/*
+	 * If this is a BRIN index with suitable opclass (minmax or such), we may
+	 * try doing BRIN sort. BRIN indexes are not ordered and amcanorderbyop
+	 * is set to false, so we probably will need some new opclass flag to
+	 * mark indexes that support this.
+	 */
+	if (enable_brinsort && pathkeys_possibly_useful)
+	{
+		ListCell *lc;
+		Relation rel2 = relation_open(index->indexoid, NoLock);
+		int		 idx;
+
+		/*
+		 * Try generating sorted paths for each key with the right opclass.
+		 */
+		idx = -1;
+		foreach(lc, index->indextlist)
+		{
+			TargetEntry	   *indextle = (TargetEntry *) lfirst(lc);
+			BrinSortPath   *bpath;
+			Oid				rangeproc;
+			AttrNumber		attnum;
+
+			idx++;
+			attnum = (idx + 1);
+
+
+			/* XXX ignore non-BRIN indexes */
+			if (rel2->rd_rel->relam != BRIN_AM_OID)
+				continue;
+
+			/*
+			 * XXX Ignore keys not using an opclass with the "ranges" proc.
+			 * For now we only do this for some minmax opclasses, but adding
+			 * it to all minmax is simple, and adding it to minmax-multi
+			 * should not be very hard.
+			 */
+			rangeproc = index_getprocid(rel2, attnum, BRIN_PROCNUM_RANGES);
+			if (!OidIsValid(rangeproc))
+				continue;
+
+			/*
+			 * XXX stuff extracted from build_index_pathkeys, except that we
+			 * only deal with a single index key (producing a single pathkey),
+			 * so we only sort on a single column. I guess we could use more
+			 * index keys and sort on more expressions? Would that mean these
+			 * keys need to be rather well correlated? In any case, it seems
+			 * rather complex to implement, so I leave it as a possible
+			 * future improvement.
+			 *
+			 * XXX This could also use the other BRIN keys (even from other
+			 * indexes) in a different way - we might use the other ranges
+			 * to quickly eliminate some of the chunks, essentially like a
+			 * bitmap, but maybe without using the bitmap. Or we might use
+			 * other indexes through bitmaps.
+			 *
+			 * XXX This fakes a number of parameters, because we don't store
+			 * the btree opclass in the index, instead we use the default
+			 * one for the key data type. And BRIN does not allow specifying
+			 *
+			 * XXX We don't add the path to result, because this function is
+			 * supposed to generate IndexPaths. Instead, we just add the path
+			 * using add_path(). We should be building this in a different
+			 * place, perhaps in create_index_paths() or so.
+			 *
+			 * XXX By building it elsewhere, we could also leverage the index
+			 * paths we've built here, particularly the bitmap index paths,
+			 * which we could use to eliminate many of the ranges.
+			 *
+			 * XXX We don't have any explicit ordering associated with the
+			 * BRIN index, e.g. we don't have ASC/DESC and NULLS FIRST/LAST.
+			 * So this is not encoded in the index, and we can satisfy all
+			 * these cases - but we need to add paths for each combination.
+			 * I wonder if there's a better way to do this.
+			 */
+
+			/* ASC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS LAST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   false);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* ASC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   false,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 ForwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+
+			/* DESC NULLS FIRST */
+			index_pathkeys = build_index_pathkeys_brin(root, index, indextle,
+													   idx,
+													   true,	/* reverse_sort */
+													   true);	/* nulls_first */
+
+			useful_pathkeys = truncate_useless_pathkeys(root, rel,
+														index_pathkeys);
+
+			if (useful_pathkeys != NIL)
+			{
+				bpath = create_brinsort_path(root, index,
+											 index_clauses,
+											 useful_pathkeys,
+											 BackwardScanDirection,
+											 index_only_scan,
+											 outer_relids,
+											 loop_count,
+											 false);
+
+				/* cheat and add it anyway */
+				add_path(rel, (Path *) bpath);
+			}
+		}
+
+		relation_close(rel2, NoLock);
+	}
+
 	return result;
 }
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index e53ea84224..3d66c0d871 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -27,6 +27,7 @@
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/lsyscache.h"
+#include "utils/typcache.h"
 
 
 static bool pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys);
@@ -622,6 +623,54 @@ build_index_pathkeys(PlannerInfo *root,
 	return retval;
 }
 
+
+List *
+build_index_pathkeys_brin(PlannerInfo *root,
+						  IndexOptInfo *index,
+						  TargetEntry  *tle,
+						  int idx,
+						  bool reverse_sort,
+						  bool nulls_first)
+{
+	TypeCacheEntry *typcache;
+	PathKey		   *cpathkey;
+	Oid				sortopfamily;
+
+	/*
+	 * Get default btree opfamily for the type, extracted from the
+	 * entry in index targetlist.
+	 *
+	 * XXX Is there a better / more correct way to do this?
+	 */
+	typcache = lookup_type_cache(exprType((Node *) tle->expr),
+								 TYPECACHE_BTREE_OPFAMILY);
+	sortopfamily = typcache->btree_opf;
+
+	/*
+	 * OK, try to make a canonical pathkey for this sort key.  Note we're
+	 * underneath any outer joins, so nullable_relids should be NULL.
+	 */
+	cpathkey = make_pathkey_from_sortinfo(root,
+										  tle->expr,
+										  sortopfamily,
+										  index->opcintype[idx],
+										  index->indexcollations[idx],
+										  reverse_sort,
+										  nulls_first,
+										  0,
+										  index->rel->relids,
+										  false);
+
+	/*
+	 * There may be no pathkey if we haven't matched any sortkey, in which
+	 * case ignore it.
+	 */
+	if (!cpathkey)
+		return NIL;
+
+	return list_make1(cpathkey);
+}
+
 /*
  * partkey_is_bool_constant_for_query
  *
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ec73789bc2..4eea942dbf 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -18,6 +18,7 @@
 
 #include <math.h>
 
+#include "access/genam.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
 #include "foreign/fdwapi.h"
@@ -41,6 +42,7 @@
 #include "parser/parsetree.h"
 #include "partitioning/partprune.h"
 #include "utils/lsyscache.h"
+#include "utils/rel.h"
 
 
 /*
@@ -124,6 +126,8 @@ static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
 										  List *tlist, List *scan_clauses);
 static Scan *create_indexscan_plan(PlannerInfo *root, IndexPath *best_path,
 								   List *tlist, List *scan_clauses, bool indexonly);
+static BrinSort *create_brinsort_plan(PlannerInfo *root, BrinSortPath *best_path,
+									  List *tlist, List *scan_clauses);
 static BitmapHeapScan *create_bitmap_scan_plan(PlannerInfo *root,
 											   BitmapHeapPath *best_path,
 											   List *tlist, List *scan_clauses);
@@ -191,6 +195,9 @@ static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 										 List *indexorderby,
 										 List *indextlist,
 										 ScanDirection indexscandir);
+static BrinSort *make_brinsort(List *qptlist, List *qpqual, Index scanrelid,
+							   Oid indexid, List *indexqual, List *indexqualorig,
+							   ScanDirection indexscandir);
 static BitmapIndexScan *make_bitmap_indexscan(Index scanrelid, Oid indexid,
 											  List *indexqual,
 											  List *indexqualorig);
@@ -410,6 +417,9 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 		case T_CustomScan:
 			plan = create_scan_plan(root, best_path, flags);
 			break;
+		case T_BrinSort:
+			plan = create_scan_plan(root, best_path, flags);
+			break;
 		case T_HashJoin:
 		case T_MergeJoin:
 		case T_NestLoop:
@@ -776,6 +786,13 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 												   scan_clauses);
 			break;
 
+		case T_BrinSort:
+			plan = (Plan *) create_brinsort_plan(root,
+												 (BrinSortPath *) best_path,
+												 tlist,
+												 scan_clauses);
+			break;
+
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -3170,6 +3187,223 @@ create_indexscan_plan(PlannerInfo *root,
 	return scan_plan;
 }
 
+/*
+ * create_brinsort_plan
+ *	  Returns a brinsort plan for the base relation scanned by 'best_path'
+ *	  with restriction clauses 'scan_clauses' and targetlist 'tlist'.
+ *
+ * This is mostly a slighly simplified version of create_indexscan_plan, with
+ * the unecessary parts removed (we don't support indexonly scans, or reordering
+ * and similar stuff).
+ */
+static BrinSort *
+create_brinsort_plan(PlannerInfo *root,
+					 BrinSortPath *best_path,
+					 List *tlist,
+					 List *scan_clauses)
+{
+	BrinSort   *brinsort_plan;
+	List	   *indexclauses = best_path->ipath.indexclauses;
+	Index		baserelid = best_path->ipath.path.parent->relid;
+	IndexOptInfo *indexinfo = best_path->ipath.indexinfo;
+	Oid			indexoid = indexinfo->indexoid;
+	Relation	indexRel;
+	List	   *indexprs;
+	List	   *qpqual;
+	List	   *stripped_indexquals;
+	List	   *fixed_indexquals;
+	ListCell   *l;
+
+	List	   *pathkeys = best_path->ipath.path.pathkeys;
+
+	/* it should be a base rel... */
+	Assert(baserelid > 0);
+	Assert(best_path->ipath.path.parent->rtekind == RTE_RELATION);
+
+	/*
+	 * Extract the index qual expressions (stripped of RestrictInfos) from the
+	 * IndexClauses list, and prepare a copy with index Vars substituted for
+	 * table Vars.  (This step also does replace_nestloop_params on the
+	 * fixed_indexquals.)
+	 */
+	fix_indexqual_references(root, &best_path->ipath,
+							 &stripped_indexquals,
+							 &fixed_indexquals);
+
+	/*
+	 * The qpqual list must contain all restrictions not automatically handled
+	 * by the index, other than pseudoconstant clauses which will be handled
+	 * by a separate gating plan node.  All the predicates in the indexquals
+	 * will be checked (either by the index itself, or by nodeIndexscan.c),
+	 * but if there are any "special" operators involved then they must be
+	 * included in qpqual.  The upshot is that qpqual must contain
+	 * scan_clauses minus whatever appears in indexquals.
+	 *
+	 * is_redundant_with_indexclauses() detects cases where a scan clause is
+	 * present in the indexclauses list or is generated from the same
+	 * EquivalenceClass as some indexclause, and is therefore redundant with
+	 * it, though not equal.  (The latter happens when indxpath.c prefers a
+	 * different derived equality than what generate_join_implied_equalities
+	 * picked for a parameterized scan's ppi_clauses.)  Note that it will not
+	 * match to lossy index clauses, which is critical because we have to
+	 * include the original clause in qpqual in that case.
+	 *
+	 * In some situations (particularly with OR'd index conditions) we may
+	 * have scan_clauses that are not equal to, but are logically implied by,
+	 * the index quals; so we also try a predicate_implied_by() check to see
+	 * if we can discard quals that way.  (predicate_implied_by assumes its
+	 * first input contains only immutable functions, so we have to check
+	 * that.)
+	 *
+	 * Note: if you change this bit of code you should also look at
+	 * extract_nonindex_conditions() in costsize.c.
+	 */
+	qpqual = NIL;
+	foreach(l, scan_clauses)
+	{
+		RestrictInfo *rinfo = lfirst_node(RestrictInfo, l);
+
+		if (rinfo->pseudoconstant)
+			continue;			/* we may drop pseudoconstants here */
+		if (is_redundant_with_indexclauses(rinfo, indexclauses))
+			continue;			/* dup or derived from same EquivalenceClass */
+		if (!contain_mutable_functions((Node *) rinfo->clause) &&
+			predicate_implied_by(list_make1(rinfo->clause), stripped_indexquals,
+								 false))
+			continue;			/* provably implied by indexquals */
+		qpqual = lappend(qpqual, rinfo);
+	}
+
+	/* Sort clauses into best execution order */
+	qpqual = order_qual_clauses(root, qpqual);
+
+	/* Reduce RestrictInfo list to bare expressions; ignore pseudoconstants */
+	qpqual = extract_actual_clauses(qpqual, false);
+
+	/*
+	 * We have to replace any outer-relation variables with nestloop params in
+	 * the indexqualorig, qpqual, and indexorderbyorig expressions.  A bit
+	 * annoying to have to do this separately from the processing in
+	 * fix_indexqual_references --- rethink this when generalizing the inner
+	 * indexscan support.  But note we can't really do this earlier because
+	 * it'd break the comparisons to predicates above ... (or would it?  Those
+	 * wouldn't have outer refs)
+	 */
+	if (best_path->ipath.path.param_info)
+	{
+		stripped_indexquals = (List *)
+			replace_nestloop_params(root, (Node *) stripped_indexquals);
+		qpqual = (List *)
+			replace_nestloop_params(root, (Node *) qpqual);
+	}
+
+	/* Finally ready to build the plan node */
+	brinsort_plan = make_brinsort(tlist,
+								  qpqual,
+								  baserelid,
+								  indexoid,
+								  fixed_indexquals,
+								  stripped_indexquals,
+								  best_path->ipath.indexscandir);
+
+	Assert(list_length(pathkeys) == 1);
+
+	if (pathkeys != NIL)
+	{
+		/*
+		 * Compute sort column info, and adjust the Append's tlist as needed.
+		 * Because we pass adjust_tlist_in_place = true, we may ignore the
+		 * function result; it must be the same plan node.  However, we then
+		 * need to detect whether any tlist entries were added.
+		 */
+		(void) prepare_sort_from_pathkeys((Plan *) brinsort_plan, pathkeys,
+										  best_path->ipath.path.parent->relids,
+										  NULL,
+										  true,
+										  &brinsort_plan->numCols,
+										  &brinsort_plan->sortColIdx,
+										  &brinsort_plan->sortOperators,
+										  &brinsort_plan->collations,
+										  &brinsort_plan->nullsFirst);
+		//tlist_was_changed = (orig_tlist_length != list_length(plan->plan.targetlist));
+		for (int i = 0; i < brinsort_plan->numCols; i++)
+			elog(DEBUG1, "%d => %d %d %d %d", i,
+				 brinsort_plan->sortColIdx[i],
+				 brinsort_plan->sortOperators[i],
+				 brinsort_plan->collations[i],
+				 brinsort_plan->nullsFirst[i]);
+	}
+
+	copy_generic_path_info(&brinsort_plan->scan.plan, &best_path->ipath.path);
+
+	/*
+	 * Now lookup the index attnums for sort expressions.
+	 *
+	 * Determine index attnum we're interested in. sortColIdx is an index into
+	 * the target list, so we need to grab the expression and try to match it
+	 * to the index. The expression may be either plain Var (in which case we
+	 * match it to indkeys value), or an expression (in which case we match it
+	 * to indexprs).
+	 *
+	 * XXX We've already matched the sort key to the index, otherwise we would
+	 * not get here. So maybe we could just remember it, somehow? Also, we must
+	 * keep the decisions made in these two places consistent - if we fail to
+	 * match a sort key here (which we matched before), we have a problem.
+	 *
+	 * FIXME lock mode for index_open
+	 */
+	indexRel = index_open(indexoid, NoLock);
+	indexprs = RelationGetIndexExpressions(indexRel);
+
+	brinsort_plan->attnums
+		= (AttrNumber *) palloc0(sizeof(AttrNumber) * brinsort_plan->numCols);
+
+	for (int i = 0; i < brinsort_plan->numCols; i++)
+	{
+		TargetEntry *tle;
+		int			expridx = 0;	/* expression index */
+
+		tle = list_nth(brinsort_plan->scan.plan.targetlist,
+					   brinsort_plan->sortColIdx[i] - 1);	/* FIXME proper colidx */
+
+		/* find the index key matching the expression from the target entry */
+		for (int j = 0; j < indexRel->rd_index->indnatts; j++)
+		{
+			AttrNumber indkey = indexRel->rd_index->indkey.values[j];
+
+			if (AttributeNumberIsValid(indkey))
+			{
+				Var *var = (Var *) tle->expr;
+
+				if (!IsA(tle->expr, Var))
+					continue;
+
+				if (var->varattno == indkey)
+				{
+					brinsort_plan->attnums[i] = (j + 1);
+					break;
+				}
+			}
+			else
+			{
+				Node *expr = (Node *) list_nth(indexprs, expridx);
+
+				if (equal(expr, tle->expr))
+				{
+					brinsort_plan->attnums[i] = (j + 1);
+					break;
+				}
+
+				expridx++;
+			}
+		}
+	}
+
+	index_close(indexRel, NoLock);
+
+	return brinsort_plan;
+}
+
 /*
  * create_bitmap_scan_plan
  *	  Returns a bitmap scan plan for the base relation scanned by 'best_path'
@@ -5524,6 +5758,31 @@ make_indexscan(List *qptlist,
 	return node;
 }
 
+static BrinSort *
+make_brinsort(List *qptlist,
+			   List *qpqual,
+			   Index scanrelid,
+			   Oid indexid,
+			   List *indexqual,
+			   List *indexqualorig,
+			   ScanDirection indexscandir)
+{
+	BrinSort  *node = makeNode(BrinSort);
+	Plan	   *plan = &node->scan.plan;
+
+	plan->targetlist = qptlist;
+	plan->qual = qpqual;
+	plan->lefttree = NULL;
+	plan->righttree = NULL;
+	node->scan.scanrelid = scanrelid;
+	node->indexid = indexid;
+	node->indexqual = indexqual;
+	node->indexqualorig = indexqualorig;
+	node->indexorderdir = indexscandir;
+
+	return node;
+}
+
 static IndexOnlyScan *
 make_indexonlyscan(List *qptlist,
 				   List *qpqual,
@@ -7158,6 +7417,7 @@ is_projection_capable_path(Path *path)
 		case T_Memoize:
 		case T_Sort:
 		case T_IncrementalSort:
+		case T_BrinSort:
 		case T_Unique:
 		case T_SetOp:
 		case T_LockRows:
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index c63758cb2b..b198119c9b 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -681,6 +681,25 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 				return set_indexonlyscan_references(root, splan, rtoffset);
 			}
 			break;
+		case T_BrinSort:
+			{
+				BrinSort  *splan = (BrinSort *) plan;
+
+				splan->scan.scanrelid += rtoffset;
+				splan->scan.plan.targetlist =
+					fix_scan_list(root, splan->scan.plan.targetlist,
+								  rtoffset, NUM_EXEC_TLIST(plan));
+				splan->scan.plan.qual =
+					fix_scan_list(root, splan->scan.plan.qual,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+				splan->indexqual =
+					fix_scan_list(root, splan->indexqual,
+								  rtoffset, 1);
+				splan->indexqualorig =
+					fix_scan_list(root, splan->indexqualorig,
+								  rtoffset, NUM_EXEC_QUAL(plan));
+			}
+			break;
 		case T_BitmapIndexScan:
 			{
 				BitmapIndexScan *splan = (BitmapIndexScan *) plan;
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 5f5596841c..1ad4235cd2 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1028,6 +1028,63 @@ create_index_path(PlannerInfo *root,
 	return pathnode;
 }
 
+
+/*
+ * create_brinsort_path
+ *	  Creates a path node for sorted brin sort scan.
+ *
+ * 'index' is a usable index.
+ * 'indexclauses' is a list of IndexClause nodes representing clauses
+ *			to be enforced as qual conditions in the scan.
+ * 'indexorderbys' is a list of bare expressions (no RestrictInfos)
+ *			to be used as index ordering operators in the scan.
+ * 'indexorderbycols' is an integer list of index column numbers (zero based)
+ *			the ordering operators can be used with.
+ * 'pathkeys' describes the ordering of the path.
+ * 'indexscandir' is ForwardScanDirection or BackwardScanDirection
+ *			for an ordered index, or NoMovementScanDirection for
+ *			an unordered index.
+ * 'indexonly' is true if an index-only scan is wanted.
+ * 'required_outer' is the set of outer relids for a parameterized path.
+ * 'loop_count' is the number of repetitions of the indexscan to factor into
+ *		estimates of caching behavior.
+ * 'partial_path' is true if constructing a parallel index scan path.
+ *
+ * Returns the new path node.
+ */
+BrinSortPath *
+create_brinsort_path(PlannerInfo *root,
+					 IndexOptInfo *index,
+					 List *indexclauses,
+					 List *pathkeys,
+					 ScanDirection indexscandir,
+					 bool indexonly,
+					 Relids required_outer,
+					 double loop_count,
+					 bool partial_path)
+{
+	BrinSortPath  *pathnode = makeNode(BrinSortPath);
+	RelOptInfo *rel = index->rel;
+
+	pathnode->ipath.path.pathtype = T_BrinSort;
+	pathnode->ipath.path.parent = rel;
+	pathnode->ipath.path.pathtarget = rel->reltarget;
+	pathnode->ipath.path.param_info = get_baserel_parampathinfo(root, rel,
+														  required_outer);
+	pathnode->ipath.path.parallel_aware = false;
+	pathnode->ipath.path.parallel_safe = rel->consider_parallel;
+	pathnode->ipath.path.parallel_workers = 0;
+	pathnode->ipath.path.pathkeys = pathkeys;
+
+	pathnode->ipath.indexinfo = index;
+	pathnode->ipath.indexclauses = indexclauses;
+	pathnode->ipath.indexscandir = indexscandir;
+
+	cost_brinsort(pathnode, root, loop_count, partial_path);
+
+	return pathnode;
+}
+
 /*
  * create_bitmap_heap_path
  *	  Creates a path node for a bitmap scan.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 2fa7ac0052..e696029cf3 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -103,6 +103,10 @@ extern bool debug_brin_stats;
 extern bool debug_brin_cross_check;
 #endif
 
+#ifdef DEBUG_BRIN_SORT
+extern bool debug_brin_sort;
+#endif
+
 #ifdef TRACE_SYNCSCAN
 extern bool trace_syncscan;
 #endif
@@ -1040,6 +1044,16 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_brinsort", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's use of BRIN sort plans."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_brinsort,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
@@ -1281,6 +1295,20 @@ struct config_bool ConfigureNamesBool[] =
 	},
 #endif
 
+#ifdef DEBUG_BRIN_SORT
+	/* this is undocumented because not exposed in a standard build */
+	{
+		{"debug_brin_sort", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("Print info about BRIN sorting."),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&debug_brin_sort,
+		false,
+		NULL, NULL, NULL
+	},
+#endif
+
 	{
 		{"exit_on_error", PGC_USERSET, ERROR_HANDLING_OPTIONS,
 			gettext_noop("Terminate session on any error."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9ca523073f..af59d9048d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -375,6 +375,7 @@
 
 #enable_async_append = on
 #enable_bitmapscan = on
+#enable_brinsort = off
 #enable_gathermerge = on
 #enable_hashagg = on
 #enable_hashjoin = on
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index e5a4e5b371..0c2c2e1121 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2574,6 +2574,18 @@ tuplesort_get_stats(Tuplesortstate *state,
 	}
 }
 
+/*
+ * tuplesort_reset_stats - reset summary statistics
+ *
+ * This can be called before tuplesort_performsort() starts.
+ */
+void
+tuplesort_reset_stats(Tuplesortstate *state)
+{
+	state->isMaxSpaceDisk = false;
+	state->maxSpace = 0;
+}
+
 /*
  * Convert TuplesortMethod to a string.
  */
diff --git a/src/include/access/brin.h b/src/include/access/brin.h
index 1d21b816fc..cdfa7421ae 100644
--- a/src/include/access/brin.h
+++ b/src/include/access/brin.h
@@ -34,41 +34,6 @@ typedef struct BrinStatsData
 	BlockNumber revmapNumPages;
 } BrinStatsData;
 
-/*
- * Info about ranges for BRIN Sort.
- */
-typedef struct BrinRange
-{
-	BlockNumber blkno_start;
-	BlockNumber blkno_end;
-
-	Datum	min_value;
-	Datum	max_value;
-	bool	has_nulls;
-	bool	all_nulls;
-	bool	not_summarized;
-
-	/*
-	 * Index of the range when ordered by min_value (if there are multiple
-	 * ranges with the same min_value, it's the lowest one).
-	 */
-	uint32	min_index;
-
-	/*
-	 * Minimum min_index from all ranges with higher max_value (i.e. when
-	 * sorted by max_value). If there are multiple ranges with the same
-	 * max_value, it depends on the ordering (i.e. the ranges may get
-	 * different min_index_lowest, depending on the exact ordering).
-	 */
-	uint32	min_index_lowest;
-} BrinRange;
-
-typedef struct BrinRanges
-{
-	int			nranges;
-	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
-} BrinRanges;
-
 typedef struct BrinMinmaxStats
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
diff --git a/src/include/access/brin_internal.h b/src/include/access/brin_internal.h
index eac796e6f4..6dd3f3e3c1 100644
--- a/src/include/access/brin_internal.h
+++ b/src/include/access/brin_internal.h
@@ -76,6 +76,7 @@ typedef struct BrinDesc
 /* procedure numbers up to 10 are reserved for BRIN future expansion */
 #define BRIN_FIRST_OPTIONAL_PROCNUM 11
 #define BRIN_PROCNUM_STATISTICS		11	/* optional */
+#define BRIN_PROCNUM_RANGES 		12	/* optional */
 #define BRIN_LAST_OPTIONAL_PROCNUM	15
 
 #undef BRIN_DEBUG
diff --git a/src/include/catalog/pg_amproc.dat b/src/include/catalog/pg_amproc.dat
index 9bbd1f14f1..b499699b35 100644
--- a/src/include/catalog/pg_amproc.dat
+++ b/src/include/catalog/pg_amproc.dat
@@ -806,6 +806,8 @@
   amprocrighttype => 'bytea', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
   amprocrighttype => 'bytea', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bytea_minmax_ops', amproclefttype => 'bytea',
+  amprocrighttype => 'bytea', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom bytea
 { amprocfamily => 'brin/bytea_bloom_ops', amproclefttype => 'bytea',
@@ -839,6 +841,8 @@
   amprocrighttype => 'char', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
   amprocrighttype => 'char', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/char_minmax_ops', amproclefttype => 'char',
+  amprocrighttype => 'char', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom "char"
 { amprocfamily => 'brin/char_bloom_ops', amproclefttype => 'char',
@@ -870,6 +874,8 @@
   amprocrighttype => 'name', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
   amprocrighttype => 'name', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/name_minmax_ops', amproclefttype => 'name',
+  amprocrighttype => 'name', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom name
 { amprocfamily => 'brin/name_bloom_ops', amproclefttype => 'name',
@@ -901,6 +907,8 @@
   amprocrighttype => 'int8', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
   amprocrighttype => 'int8', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int8',
+  amprocrighttype => 'int8', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '1',
@@ -915,6 +923,8 @@
   amprocrighttype => 'int2', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
   amprocrighttype => 'int2', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int2',
+  amprocrighttype => 'int2', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '1',
@@ -929,6 +939,8 @@
   amprocrighttype => 'int4', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
   amprocrighttype => 'int4', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/integer_minmax_ops', amproclefttype => 'int4',
+  amprocrighttype => 'int4', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi integer: int2, int4, int8
 { amprocfamily => 'brin/integer_minmax_multi_ops', amproclefttype => 'int2',
@@ -1048,6 +1060,8 @@
   amprocrighttype => 'text', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
   amprocrighttype => 'text', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/text_minmax_ops', amproclefttype => 'text',
+  amprocrighttype => 'text', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom text
 { amprocfamily => 'brin/text_bloom_ops', amproclefttype => 'text',
@@ -1078,6 +1092,8 @@
   amprocrighttype => 'oid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
   amprocrighttype => 'oid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/oid_minmax_ops', amproclefttype => 'oid',
+  amprocrighttype => 'oid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi oid
 { amprocfamily => 'brin/oid_minmax_multi_ops', amproclefttype => 'oid',
@@ -1128,6 +1144,8 @@
   amprocrighttype => 'tid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
   amprocrighttype => 'tid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/tid_minmax_ops', amproclefttype => 'tid',
+  amprocrighttype => 'tid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # bloom tid
 { amprocfamily => 'brin/tid_bloom_ops', amproclefttype => 'tid',
@@ -1181,6 +1199,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
   amprocrighttype => 'float4', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float4',
+  amprocrighttype => 'float4', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '1',
@@ -1197,6 +1218,9 @@
 { amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
   amprocrighttype => 'float8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/float_minmax_ops', amproclefttype => 'float8',
+  amprocrighttype => 'float8', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi float
 { amprocfamily => 'brin/float_minmax_multi_ops', amproclefttype => 'float4',
@@ -1288,6 +1312,9 @@
 { amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
   amprocrighttype => 'macaddr', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr_minmax_ops', amproclefttype => 'macaddr',
+  amprocrighttype => 'macaddr', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr
 { amprocfamily => 'brin/macaddr_minmax_multi_ops', amproclefttype => 'macaddr',
@@ -1344,6 +1371,9 @@
 { amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
   amprocrighttype => 'macaddr8', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/macaddr8_minmax_ops', amproclefttype => 'macaddr8',
+  amprocrighttype => 'macaddr8', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi macaddr8
 { amprocfamily => 'brin/macaddr8_minmax_multi_ops',
@@ -1398,6 +1428,8 @@
   amprocrighttype => 'inet', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
   amprocrighttype => 'inet', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/network_minmax_ops', amproclefttype => 'inet',
+  amprocrighttype => 'inet', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi inet
 { amprocfamily => 'brin/network_minmax_multi_ops', amproclefttype => 'inet',
@@ -1471,6 +1503,9 @@
 { amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
   amprocrighttype => 'bpchar', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bpchar_minmax_ops', amproclefttype => 'bpchar',
+  amprocrighttype => 'bpchar', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # bloom character
 { amprocfamily => 'brin/bpchar_bloom_ops', amproclefttype => 'bpchar',
@@ -1504,6 +1539,8 @@
   amprocrighttype => 'time', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
   amprocrighttype => 'time', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/time_minmax_ops', amproclefttype => 'time',
+  amprocrighttype => 'time', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi time without time zone
 { amprocfamily => 'brin/time_minmax_multi_ops', amproclefttype => 'time',
@@ -1557,6 +1594,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
   amprocrighttype => 'timestamp', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamp',
+  amprocrighttype => 'timestamp', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '1',
@@ -1573,6 +1613,9 @@
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
   amprocrighttype => 'timestamptz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'timestamptz',
+  amprocrighttype => 'timestamptz', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '1',
@@ -1587,6 +1630,8 @@
   amprocrighttype => 'date', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
   amprocrighttype => 'date', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/datetime_minmax_ops', amproclefttype => 'date',
+  amprocrighttype => 'date', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi datetime (date, timestamp, timestamptz)
 { amprocfamily => 'brin/datetime_minmax_multi_ops',
@@ -1716,6 +1761,9 @@
 { amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
   amprocrighttype => 'interval', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/interval_minmax_ops', amproclefttype => 'interval',
+  amprocrighttype => 'interval', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi interval
 { amprocfamily => 'brin/interval_minmax_multi_ops',
@@ -1772,6 +1820,9 @@
 { amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
   amprocrighttype => 'timetz', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/timetz_minmax_ops', amproclefttype => 'timetz',
+  amprocrighttype => 'timetz', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi time with time zone
 { amprocfamily => 'brin/timetz_minmax_multi_ops', amproclefttype => 'timetz',
@@ -1824,6 +1875,8 @@
   amprocrighttype => 'bit', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
   amprocrighttype => 'bit', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/bit_minmax_ops', amproclefttype => 'bit',
+  amprocrighttype => 'bit', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax bit varying
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
@@ -1841,6 +1894,9 @@
 { amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
   amprocrighttype => 'varbit', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/varbit_minmax_ops', amproclefttype => 'varbit',
+  amprocrighttype => 'varbit', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax numeric
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
@@ -1858,6 +1914,9 @@
 { amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
   amprocrighttype => 'numeric', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/numeric_minmax_ops', amproclefttype => 'numeric',
+  amprocrighttype => 'numeric', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi numeric
 { amprocfamily => 'brin/numeric_minmax_multi_ops', amproclefttype => 'numeric',
@@ -1912,6 +1971,8 @@
   amprocrighttype => 'uuid', amprocnum => '4', amproc => 'brin_minmax_union' },
 { amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
   amprocrighttype => 'uuid', amprocnum => '11', amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/uuid_minmax_ops', amproclefttype => 'uuid',
+  amprocrighttype => 'uuid', amprocnum => '12', amproc => 'brin_minmax_ranges' },
 
 # minmax multi uuid
 { amprocfamily => 'brin/uuid_minmax_multi_ops', amproclefttype => 'uuid',
@@ -1988,6 +2049,9 @@
 { amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
   amprocrighttype => 'pg_lsn', amprocnum => '11',
   amproc => 'brin_minmax_stats' },
+{ amprocfamily => 'brin/pg_lsn_minmax_ops', amproclefttype => 'pg_lsn',
+  amprocrighttype => 'pg_lsn', amprocnum => '12',
+  amproc => 'brin_minmax_ranges' },
 
 # minmax multi pg_lsn
 { amprocfamily => 'brin/pg_lsn_minmax_multi_ops', amproclefttype => 'pg_lsn',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e50a21fc22..3280751380 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8531,6 +8531,9 @@
   proname => 'brin_minmax_stats', prorettype => 'bool',
   proargtypes => 'internal internal int2 int2 internal int4',
   prosrc => 'brin_minmax_stats' },
+{ oid => '9801', descr => 'BRIN minmax support',
+  proname => 'brin_minmax_ranges', prorettype => 'bool',
+  proargtypes => 'internal int2 bool', prosrc => 'brin_minmax_ranges' },
 
 # BRIN minmax multi
 { oid => '4616', descr => 'BRIN multi minmax support',
diff --git a/src/include/executor/nodeBrinSort.h b/src/include/executor/nodeBrinSort.h
new file mode 100644
index 0000000000..3cac599d81
--- /dev/null
+++ b/src/include/executor/nodeBrinSort.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeBrinSort.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeBrinSort.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODEBRIN_SORT_H
+#define NODEBRIN_SORT_H
+
+#include "access/genam.h"
+#include "access/parallel.h"
+#include "nodes/execnodes.h"
+
+extern BrinSortState *ExecInitBrinSort(BrinSort *node, EState *estate, int eflags);
+extern void ExecEndBrinSort(BrinSortState *node);
+extern void ExecBrinSortMarkPos(BrinSortState *node);
+extern void ExecBrinSortRestrPos(BrinSortState *node);
+extern void ExecReScanBrinSort(BrinSortState *node);
+extern void ExecBrinSortEstimate(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortReInitializeDSM(BrinSortState *node, ParallelContext *pcxt);
+extern void ExecBrinSortInitializeWorker(BrinSortState *node,
+										  ParallelWorkerContext *pwcxt);
+
+/*
+ * These routines are exported to share code with nodeIndexonlyscan.c and
+ * nodeBitmapBrinSort.c
+ */
+extern void ExecIndexBuildScanKeys(PlanState *planstate, Relation index,
+								   List *quals, bool isorderby,
+								   ScanKey *scanKeys, int *numScanKeys,
+								   IndexRuntimeKeyInfo **runtimeKeys, int *numRuntimeKeys,
+								   IndexArrayKeyInfo **arrayKeys, int *numArrayKeys);
+extern void ExecIndexEvalRuntimeKeys(ExprContext *econtext,
+									 IndexRuntimeKeyInfo *runtimeKeys, int numRuntimeKeys);
+extern bool ExecIndexEvalArrayKeys(ExprContext *econtext,
+								   IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+extern bool ExecIndexAdvanceArrayKeys(IndexArrayKeyInfo *arrayKeys, int numArrayKeys);
+
+#endif							/* NODEBRIN_SORT_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cb714f4a19..cbaa540768 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1583,6 +1583,114 @@ typedef struct IndexScanState
 	Size		iss_PscanLen;
 } IndexScanState;
 
+typedef enum {
+	BRINSORT_START,
+	BRINSORT_LOAD_RANGE,
+	BRINSORT_PROCESS_RANGE,
+	BRINSORT_LOAD_NULLS,
+	BRINSORT_PROCESS_NULLS,
+	BRINSORT_FINISHED
+} BrinSortPhase;
+
+typedef struct BrinRangeScanDesc
+{
+	/* range info tuple descriptor */
+	TupleDesc		tdesc;
+
+	/* ranges, sorted by minval, blkno_start */
+	Tuplesortstate *ranges;
+
+	/* number of ranges in the tuplesort */
+	int64			nranges;
+
+	/* distinct minval (sorted) */
+	Tuplestorestate *minvals;
+
+	/* slot for accessing the tuplesort/tuplestore */
+	TupleTableSlot  *slot;
+
+} BrinRangeScanDesc;
+
+/*
+ * Info about ranges for BRIN Sort.
+ */
+typedef struct BrinRange
+{
+	BlockNumber blkno_start;
+	BlockNumber blkno_end;
+
+	Datum	min_value;
+	Datum	max_value;
+	bool	has_nulls;
+	bool	all_nulls;
+	bool	not_summarized;
+
+	/*
+	 * Index of the range when ordered by min_value (if there are multiple
+	 * ranges with the same min_value, it's the lowest one).
+	 */
+	uint32	min_index;
+
+	/*
+	 * Minimum min_index from all ranges with higher max_value (i.e. when
+	 * sorted by max_value). If there are multiple ranges with the same
+	 * max_value, it depends on the ordering (i.e. the ranges may get
+	 * different min_index_lowest, depending on the exact ordering).
+	 */
+	uint32	min_index_lowest;
+} BrinRange;
+
+typedef struct BrinRanges
+{
+	int			nranges;
+	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
+} BrinRanges;
+
+typedef struct BrinSortState
+{
+	ScanState	ss;				/* its first field is NodeTag */
+	ExprState  *indexqualorig;
+	List	   *indexorderbyorig;
+	struct ScanKeyData *iss_ScanKeys;
+	int			iss_NumScanKeys;
+	struct ScanKeyData *iss_OrderByKeys;
+	int			iss_NumOrderByKeys;
+	IndexRuntimeKeyInfo *iss_RuntimeKeys;
+	int			iss_NumRuntimeKeys;
+	bool		iss_RuntimeKeysReady;
+	ExprContext *iss_RuntimeContext;
+	Relation	iss_RelationDesc;
+	struct IndexScanDescData *iss_ScanDesc;
+
+	/* These are needed for re-checking ORDER BY expr ordering */
+	pairingheap *iss_ReorderQueue;
+	bool		iss_ReachedEnd;
+	Datum	   *iss_OrderByValues;
+	bool	   *iss_OrderByNulls;
+	SortSupport iss_SortSupport;
+	bool	   *iss_OrderByTypByVals;
+	int16	   *iss_OrderByTypLens;
+	Size		iss_PscanLen;
+
+	/* */
+	BrinRangeScanDesc *bs_scan;
+	BrinRange	   *bs_range;
+	ExprState	   *bs_qual;
+	Datum			bs_watermark;
+	bool			bs_watermark_set;
+	bool			bs_watermark_empty;
+	BrinSortPhase	bs_phase;
+	SortSupportData	bs_sortsupport;
+	ProjectionInfo *bs_ProjInfo;
+
+	/*
+	 * We need two tuplesort instances - one for current range, one for
+	 * spill-over tuples from the overlapping ranges
+	 */
+	void		   *bs_tuplesortstate;
+	Tuplestorestate *bs_tuplestore;
+} BrinSortState;
+
 /* ----------------
  *	 IndexOnlyScanState information
  *
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c17b53f7ad..0dbe00ef7d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1684,6 +1684,17 @@ typedef struct IndexPath
 	Selectivity indexselectivity;
 } IndexPath;
 
+/*
+ * read sorted data from brin index
+ *
+ * We use IndexPath, because that's what amcostestimate is expecting, but
+ * we typedef it as a separate struct.
+ */
+typedef struct BrinSortPath
+{
+	IndexPath	ipath;
+} BrinSortPath;
+
 /*
  * Each IndexClause references a RestrictInfo node from the query's WHERE
  * or JOIN conditions, and shows how that restriction can be applied to
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 1b787fe031..b4fa5faff5 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -498,6 +498,38 @@ typedef struct IndexOnlyScan
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
 } IndexOnlyScan;
 
+/*
+ * XXX Does it make sense (is it possible) to have a sort by more than one
+ * column, using a BRIN index?
+ */
+typedef struct BrinSort
+{
+	Scan		scan;
+	Oid			indexid;		/* OID of index to scan */
+	List	   *indexqual;		/* list of index quals (usually OpExprs) */
+	List	   *indexqualorig;	/* the same in original form */
+	ScanDirection indexorderdir;	/* forward or backward or don't care */
+
+	/* number of sort-key columns */
+	int			numCols;
+
+	/* attnums in the index */
+	AttrNumber *attnums pg_node_attr(array_size(numCols));
+
+	/* their indexes in the target list */
+	AttrNumber *sortColIdx pg_node_attr(array_size(numCols));
+
+	/* OIDs of operators to sort them by */
+	Oid		   *sortOperators pg_node_attr(array_size(numCols));
+
+	/* OIDs of collations */
+	Oid		   *collations pg_node_attr(array_size(numCols));
+
+	/* NULLS FIRST/LAST directions */
+	bool	   *nullsFirst pg_node_attr(array_size(numCols));
+
+} BrinSort;
+
 /* ----------------
  *		bitmap index scan node
  *
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 6cf49705d3..1fed43645a 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -70,6 +70,7 @@ extern PGDLLIMPORT bool enable_parallel_hash;
 extern PGDLLIMPORT bool enable_partition_pruning;
 extern PGDLLIMPORT bool enable_presorted_aggregate;
 extern PGDLLIMPORT bool enable_async_append;
+extern PGDLLIMPORT bool enable_brinsort;
 extern PGDLLIMPORT int constraint_exclusion;
 
 extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
@@ -80,6 +81,8 @@ extern void cost_samplescan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 							ParamPathInfo *param_info);
 extern void cost_index(IndexPath *path, PlannerInfo *root,
 					   double loop_count, bool partial_path);
+extern void cost_brinsort(BrinSortPath *path, PlannerInfo *root,
+						  double loop_count, bool partial_path);
 extern void cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 								  ParamPathInfo *param_info,
 								  Path *bitmapqual, double loop_count);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 001e75b5b7..ebb64d525b 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -49,6 +49,15 @@ extern IndexPath *create_index_path(PlannerInfo *root,
 									Relids required_outer,
 									double loop_count,
 									bool partial_path);
+extern BrinSortPath *create_brinsort_path(PlannerInfo *root,
+									IndexOptInfo *index,
+									List *indexclauses,
+									List *pathkeys,
+									ScanDirection indexscandir,
+									bool indexonly,
+									Relids required_outer,
+									double loop_count,
+									bool partial_path);
 extern BitmapHeapPath *create_bitmap_heap_path(PlannerInfo *root,
 											   RelOptInfo *rel,
 											   Path *bitmapqual,
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 50bc3b503a..9fae9bae00 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -216,6 +216,9 @@ extern Path *get_cheapest_fractional_path_for_pathkeys(List *paths,
 extern Path *get_cheapest_parallel_safe_total_inner(List *paths);
 extern List *build_index_pathkeys(PlannerInfo *root, IndexOptInfo *index,
 								  ScanDirection scandir);
+extern List *build_index_pathkeys_brin(PlannerInfo *root, IndexOptInfo *index,
+								  TargetEntry *tle, int idx,
+								  bool reverse_sort, bool nulls_first);
 extern List *build_partition_pathkeys(PlannerInfo *root, RelOptInfo *partrel,
 									  ScanDirection scandir, bool *partialkeys);
 extern List *build_expression_pathkey(PlannerInfo *root, Expr *expr,
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index af057b6358..fb4e65cf0f 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -367,6 +367,7 @@ extern void tuplesort_reset(Tuplesortstate *state);
 
 extern void tuplesort_get_stats(Tuplesortstate *state,
 								TuplesortInstrumentation *stats);
+extern void tuplesort_reset_stats(Tuplesortstate *state);
 extern const char *tuplesort_method_name(TuplesortMethod m);
 extern const char *tuplesort_space_type_name(TuplesortSpaceType t);
 
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index b7fda6fc82..308e912c21 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
 --------------------------------+---------
  enable_async_append            | on
  enable_bitmapscan              | on
+ enable_brinsort                | on
  enable_gathermerge             | on
  enable_hashagg                 | on
  enable_hashjoin                | on
@@ -133,7 +134,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(22 rows)
+(23 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
-- 
2.41.0

0005-wip-brinsort-explain-stats-20230710.patchtext/x-patch; charset=UTF-8; name=0005-wip-brinsort-explain-stats-20230710.patchDownload

From 938158dc3fa88f6bed1e16bcf45aab33f9f15dcf Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 29 Oct 2022 22:01:01 +0200
Subject: [PATCH 05/11] wip: brinsort explain stats

Show some internal stats about BRIN Sort in EXPLAIN output.
---
 src/backend/commands/explain.c      | 132 ++++++++++++++++++++++++++++
 src/backend/executor/nodeBrinSort.c |  66 +++++++++++---
 src/include/nodes/execnodes.h       |  39 ++++++++
 3 files changed, 223 insertions(+), 14 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d2eae9ea1f..6d1609011e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -87,6 +87,8 @@ static void show_incremental_sort_keys(IncrementalSortState *incrsortstate,
 									   List *ancestors, ExplainState *es);
 static void show_brinsort_keys(BrinSortState *sortstate, List *ancestors,
 							   ExplainState *es);
+static void show_brinsort_stats(BrinSortState *sortstate, List *ancestors,
+								ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
 								   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1828,6 +1830,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 										   planstate, es);
 			show_scan_qual(plan->qual, "Filter", planstate, ancestors, es);
 			show_brinsort_keys(castNode(BrinSortState, planstate), ancestors, es);
+			show_brinsort_stats(castNode(BrinSortState, planstate), ancestors, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
@@ -2446,6 +2449,135 @@ show_brinsort_keys(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 						 ancestors, es);
 }
 
+static void
+show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
+{
+	BrinSortStats  *stats = &sortstate->bs_stats;
+
+	if (sortstate->bs_scan != NULL &&
+		sortstate->bs_scan->ranges != NULL)
+	{
+		TuplesortInstrumentation stats;
+
+		memset(&stats, 0, sizeof(TuplesortInstrumentation));
+		tuplesort_get_stats(sortstate->bs_scan->ranges, &stats);
+
+		ExplainIndentText(es);
+		appendStringInfo(es->str, "Ranges: " INT64_FORMAT "  Build time: " INT64_FORMAT "  Method: %s  Space: " INT64_FORMAT " kB (%s)\n",
+						 sortstate->bs_scan->nranges,
+						 sortstate->bs_stats.ranges_build_ms,
+						 tuplesort_method_name(stats.sortMethod),
+						 stats.spaceUsed,
+						 tuplesort_space_type_name(stats.spaceType));
+	}
+
+	if (stats->sort_count > 0)
+	{
+		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
+							   stats->range_count, es);
+
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainPropertyInteger("Sorts", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Tuples Sorted: " INT64_FORMAT "  Per-sort: " INT64_FORMAT  "  Direct: " INT64_FORMAT "  Spilled: " INT64_FORMAT "  Respilled: " INT64_FORMAT "\n",
+							 stats->ntuples_tuplesort_all,
+							 stats->ntuples_tuplesort_all / stats->sort_count,
+							 stats->ntuples_tuplesort_direct,
+							 stats->ntuples_spilled,
+							 stats->ntuples_respilled);
+		}
+		else
+		{
+			ExplainOpenGroup("Sorts", "Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count, es);
+
+			ExplainPropertyInteger("Tuples per sort", NULL, (int64)
+								   stats->ntuples_tuplesort_all / stats->sort_count, es);
+
+			ExplainPropertyInteger("Sorted tuples (all)", NULL, (int64)
+								   stats->ntuples_tuplesort_all, es);
+
+			ExplainPropertyInteger("Sorted tuples (direct)", NULL, (int64)
+								   stats->ntuples_tuplesort_direct, es);
+
+			ExplainPropertyInteger("Spilled tuples", NULL, (int64)
+								   stats->ntuples_spilled, es);
+
+			ExplainPropertyInteger("Respilled tuples", NULL, (int64)
+								   stats->ntuples_respilled, es);
+
+			ExplainCloseGroup("Sorts", "Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_in_memory > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (in-memory)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_in_memory,
+							 stats->total_space_used_in_memory,
+							 stats->max_space_used_in_memory,
+							 stats->total_space_used_in_memory / stats->sort_count_in_memory);
+		}
+		else
+		{
+			ExplainOpenGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_in_memory / stats->sort_count_in_memory, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_in_memory, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_in_memory, es);
+
+			ExplainCloseGroup("In-Memory Sorts", "In-Memory Sorts", true, es);
+		}
+	}
+
+	if (stats->sort_count_on_disk > 0)
+	{
+		if (es->format == EXPLAIN_FORMAT_TEXT)
+		{
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Sorts (on-disk)  Count: " INT64_FORMAT "  Space Total: " INT64_FORMAT  " kB  Maximum: " INT64_FORMAT " kB  Average: " INT64_FORMAT " kB\n",
+							 stats->sort_count_on_disk,
+							 stats->total_space_used_on_disk,
+							 stats->max_space_used_on_disk,
+							 stats->total_space_used_on_disk / stats->sort_count_on_disk);
+		}
+		else
+		{
+			ExplainOpenGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+
+			ExplainPropertyInteger("Count", NULL, (int64)
+								   stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Average space", "kB", (int64)
+								   stats->total_space_used_on_disk / stats->sort_count_on_disk, es);
+
+			ExplainPropertyInteger("Maximum space", "kB", (int64)
+								   stats->max_space_used_on_disk, es);
+
+			ExplainPropertyInteger("Total space", "kB", (int64)
+								   stats->total_space_used_on_disk, es);
+
+			ExplainCloseGroup("On-Disk Sorts", "On-Disk Sorts", true, es);
+		}
+	}
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 9505eafc54..614d28c83b 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -450,6 +450,8 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 	if (null_processing && !(range->has_nulls || range->not_summarized || range->all_nulls))
 		return;
 
+	node->bs_stats.range_count++;
+
 	brinsort_start_tidscan(node);
 
 	scan = node->ss.ss_currentScanDesc;
@@ -534,7 +536,10 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 				/* Stash it to the tuplestore (when NULL, or ignore
 				 * it (when not-NULL). */
 				if (isnull)
+				{
 					tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+					node->bs_stats.ntuples_spilled++;
+				}
 
 				/* NULL or not, we're done */
 				continue;
@@ -554,7 +559,12 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 										  &node->bs_sortsupport);
 
 			if (cmp <= 0)
+			{
 				tuplesort_puttupleslot(node->bs_tuplesortstate, tmpslot);
+				node->bs_stats.ntuples_tuplesort_direct++;
+				node->bs_stats.ntuples_tuplesort_all++;
+				node->bs_stats.ntuples_tuplesort++;
+			}
 			else
 			{
 				/*
@@ -565,6 +575,7 @@ brinsort_load_tuples(BrinSortState *node, bool check_watermark, bool null_proces
 				 * respill) and stop spilling.
 				 */
 				tuplestore_puttupleslot(node->bs_tuplestore, tmpslot);
+				node->bs_stats.ntuples_spilled++;
 			}
 		}
 
@@ -633,7 +644,11 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 									  &node->bs_sortsupport);
 
 		if (cmp <= 0)
+		{
 			tuplesort_puttupleslot(node->bs_tuplesortstate, slot);
+			node->bs_stats.ntuples_tuplesort_all++;
+			node->bs_stats.ntuples_tuplesort++;
+		}
 		else
 		{
 			/*
@@ -644,6 +659,7 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 			 * respill) and stop spilling.
 			 */
 			tuplestore_puttupleslot(tupstore, slot);
+			node->bs_stats.ntuples_respilled++;
 		}
 	}
 
@@ -933,23 +949,40 @@ IndexNext(BrinSortState *node)
 					 */
 					if (node->bs_tuplesortstate)
 					{
-#ifdef DEBUG_BRIN_SORT
+						TuplesortInstrumentation stats;
+
+						/*
+						 * Reset tuplesort statistics between runs, otherwise
+						 * we'll keep re-using stats from the largest run.
+						 */
 						tuplesort_reset_stats(node->bs_tuplesortstate);
-#endif
 
 						tuplesort_performsort(node->bs_tuplesortstate);
 
-#ifdef DEBUG_BRIN_SORT
-						if (debug_brin_sort)
-						{
-							TuplesortInstrumentation stats;
+						node->bs_stats.sort_count++;
+						node->bs_stats.ntuples_tuplesort = 0;
 
-							memset(&stats, 0, sizeof(TuplesortInstrumentation));
-							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
 
-							tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+						if (stats.spaceType == SORT_SPACE_TYPE_DISK)
+						{
+							node->bs_stats.sort_count_on_disk++;
+							node->bs_stats.total_space_used_on_disk += stats.spaceUsed;
+							node->bs_stats.max_space_used_on_disk = Max(node->bs_stats.max_space_used_on_disk,
+																		stats.spaceUsed);
+						}
+						else if (stats.spaceType == SORT_SPACE_TYPE_MEMORY)
+						{
+							node->bs_stats.sort_count_in_memory++;
+							node->bs_stats.total_space_used_in_memory += stats.spaceUsed;
+							node->bs_stats.max_space_used_in_memory = Max(node->bs_stats.max_space_used_in_memory,
+																		  stats.spaceUsed);
+						}
 
-							elog(WARNING, "method: %s  space: %ld kB (%s)",
+#ifdef DEBUG_BRIN_SORT
+						if (debug_brin_sort)
+						{
+							elog(WARNING, "method: %s  space: " INT64_FORMAT " kB (%s)",
 								 tuplesort_method_name(stats.sortMethod),
 								 stats.spaceUsed,
 								 tuplesort_space_type_name(stats.spaceType));
@@ -1219,9 +1252,10 @@ ExecEndBrinSort(BrinSortState *node)
 		tuplesort_end(node->bs_tuplesortstate);
 	node->bs_tuplesortstate = NULL;
 
-	if (node->bs_scan->ranges != NULL)
+	if (node->bs_scan != NULL &&
+		node->bs_scan->ranges != NULL)
 		tuplesort_end(node->bs_scan->ranges);
-	node->bs_scan->ranges = NULL;
+	node->bs_scan = NULL;
 }
 
 /* ----------------------------------------------------------------
@@ -1311,6 +1345,7 @@ ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
 	FmgrInfo   *rangeproc;
 	BrinRangeScanDesc *brscan;
 	bool		asc;
+	TimestampTz	start_ts;
 
 	/* BRIN Sort only allows ORDER BY using a single column */
 	Assert(node->numCols == 1);
@@ -1355,15 +1390,18 @@ ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate)
 
 	/*
 	 * Ask the opclass to produce ranges in appropriate ordering.
-	 *
-	 * XXX Pass info about ASC/DESC, NULLS FIRST/LAST.
 	 */
+	start_ts = GetCurrentTimestamp();
+
 	brscan = (BrinRangeScanDesc *) DatumGetPointer(FunctionCall3Coll(rangeproc,
 											node->collations[0],
 											PointerGetDatum(scan),
 											Int16GetDatum(attno),
 											BoolGetDatum(asc)));
 
+	planstate->bs_stats.ranges_build_ms
+		= TimestampDifferenceMilliseconds(start_ts, GetCurrentTimestamp());
+
 	/* allocate for space, and also for the alternative ordering */
 	planstate->bs_scan = brscan;
 }
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cbaa540768..b635a845e1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1646,6 +1646,44 @@ typedef struct BrinRanges
 	BrinRange	ranges[FLEXIBLE_ARRAY_MEMBER];
 } BrinRanges;
 
+typedef struct BrinSortStats
+{
+	/* number of sorts */
+	int64	sort_count;
+
+	/* number of ranges loaded */
+	int64	range_count;
+
+	/* tuples in the current tuplesort */
+	int64	ntuples_tuplesort;
+
+	/* tuples written directly to tuplesort */
+	int64	ntuples_tuplesort_direct;
+
+	/* tuples written to tuplesort (all) */
+	int64	ntuples_tuplesort_all;
+
+	/* tuples written to tuplestore */
+	int64	ntuples_spilled;
+
+	/* tuples copied from old to new tuplestore */
+	int64	ntuples_respilled;
+
+	/* number of in-memory/on-disk sorts */
+	int64	sort_count_in_memory;
+	int64	sort_count_on_disk;
+
+	/* total/maximum amount of space used by either sort */
+	int64	total_space_used_in_memory;
+	int64	total_space_used_on_disk;
+	int64	max_space_used_in_memory;
+	int64	max_space_used_on_disk;
+
+	/* time to build ranges (milliseconds) */
+	int64	ranges_build_ms;
+
+} BrinSortStats;
+
 typedef struct BrinSortState
 {
 	ScanState	ss;				/* its first field is NodeTag */
@@ -1682,6 +1720,7 @@ typedef struct BrinSortState
 	BrinSortPhase	bs_phase;
 	SortSupportData	bs_sortsupport;
 	ProjectionInfo *bs_ProjInfo;
+	BrinSortStats	bs_stats;
 
 	/*
 	 * We need two tuplesort instances - one for current range, one for
-- 
2.41.0

0006-wip-multiple-watermark-steps-20230710.patchtext/x-patch; charset=UTF-8; name=0006-wip-multiple-watermark-steps-20230710.patchDownload

From 8cc64c67e37e70ad205670d324fa5cfc0d66cf44 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Thu, 20 Oct 2022 13:03:00 +0200
Subject: [PATCH 06/11] wip: multiple watermark steps

Allow incrementing the minval watermark faster, by skipping some minval
values. This allows sorting more data at once (instead of many tiny
sorts, which is inefficient). This also reduces the number of rows we
need to spill (and possibly transfer multiple times).

To use a different watermark step, use a new GUC:

  SET brinsort_watermark_step = 16
---
 src/backend/commands/explain.c      |  3 ++
 src/backend/executor/nodeBrinSort.c | 59 ++++++++++++++++++++++++++---
 src/backend/utils/misc/guc_tables.c | 12 ++++++
 3 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 6d1609011e..893bce64cb 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -47,6 +47,7 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
 /* Hook for plugins to get control in explain_get_index_name() */
 explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
 
+extern int brinsort_watermark_step;
 
 /* OR-able flags for ExplainXMLTag() */
 #define X_OPENING 0
@@ -2471,6 +2472,8 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 						 tuplesort_space_type_name(stats.spaceType));
 	}
 
+	ExplainPropertyInteger("Step", NULL, (int64) brinsort_watermark_step, es);
+
 	if (stats->sort_count > 0)
 	{
 		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 614d28c83b..15d1579773 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -248,6 +248,14 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 1;
+
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -351,11 +359,24 @@ brinsort_end_tidscan(BrinSortState *node)
  * a separate "first" parameter - "set=false" has the same meaning.
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool asc)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
 {
 	int		cmp;
+
+	/* assume we haven't found a watermark */
 	bool	found = false;
 
+	Assert(steps > 0);
+
+	/*
+	 * If the watermark is empty, either this is the first call (in
+	 * which case we just use the first (or rather second) value.
+	 * Otherwise it means we've reached the end, so no point in looking
+	 * for more watermarks.
+	 */
+	if (node->bs_watermark_empty && !first)
+		return;
+
 	tuplesort_markpos(node->bs_scan->ranges);
 
 	while (tuplesort_gettupleslot(node->bs_scan->ranges, true, false, node->bs_scan->slot, NULL))
@@ -381,22 +402,48 @@ brinsort_update_watermark(BrinSortState *node, bool asc)
 		else
 			value = slot_getattr(node->bs_scan->slot, 7, &isnull);
 
+		/*
+		 * Has to be the first call (otherwise we would not get here, because we
+		 * terminate after bs_watermark_set gets flipped back to false), so we
+		 * just set the value. But we don't count this as a step, because that
+		 * just picks the first minval value, as we certainly need to do at least
+		 * one more step.
+		 *
+		 * XXX Actually, do we need to make another step? Maybe there are enough
+		 * not-summarized ranges? Although, we don't know what values are in
+		 * those, ranges, and with increasing data we might easily end up just
+		 * writing all of it into the spill tuplestore. So making one more step
+		 * seems like a better idea - we'll at least be able to produce something
+		 * which is good for LIMIT queries.
+		 */
 		if (!node->bs_watermark_set)
 		{
+			Assert(first);
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
+			found = true;
 			continue;
 		}
 
 		cmp = ApplySortComparator(node->bs_watermark, false, value, false,
 								  &node->bs_sortsupport);
 
-		if (cmp < 0)
+		/*
+		 * Values should not decrease (or whatever the operator says, might
+		 * be a DESC sort).
+		 */
+		Assert(cmp <= 0);
+
+		if (cmp < 0)	/* new watermark value */
 		{
 			node->bs_watermark_set = true;
 			node->bs_watermark = value;
 			found = true;
-			break;
+
+			steps--;
+
+			if (steps == 0)
+				break;
 		}
 	}
 
@@ -913,7 +960,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -1034,7 +1081,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, asc);
+					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
 				}
 
 				break;
@@ -1059,7 +1106,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, asc);
+								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index e696029cf3..1b798a7886 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -97,6 +97,7 @@ extern char *default_tablespace;
 extern char *temp_tablespaces;
 extern bool ignore_checksum_failure;
 extern bool ignore_invalid_pages;
+extern int	brinsort_watermark_step;
 
 #ifdef DEBUG_BRIN_STATS
 extern bool debug_brin_stats;
@@ -3579,6 +3580,17 @@ struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"brinsort_watermark_step", PGC_USERSET, DEVELOPER_OPTIONS,
+			gettext_noop("sets the step for brinsort watermark increments"),
+			NULL,
+			GUC_NOT_IN_SAMPLE
+		},
+		&brinsort_watermark_step,
+		1, 1, INT_MAX,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, 0, 0, NULL, NULL, NULL
-- 
2.41.0

0007-wip-adjust-watermark-step-20230710.patchtext/x-patch; charset=UTF-8; name=0007-wip-adjust-watermark-step-20230710.patchDownload

From 3cce707c1ff59cbfbac18be2b70b0d1fbca93934 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 00:06:28 +0200
Subject: [PATCH 07/11] wip: adjust watermark step

Look at available statistics - number of possible watermark values,
number of rows, work_mem, etc. and pick a good watermark_step value.

To calculate step using statistics, set the GUC to 0:

   SET brinsort_watermark_step = 0;
---
 src/backend/commands/explain.c          |  6 +++
 src/backend/executor/nodeBrinSort.c     | 21 ++++----
 src/backend/optimizer/plan/createplan.c | 70 +++++++++++++++++++++++++
 src/backend/utils/misc/guc_tables.c     |  2 +-
 src/include/nodes/execnodes.h           |  5 ++
 src/include/nodes/plannodes.h           |  3 ++
 6 files changed, 94 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 893bce64cb..020fb38cae 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2454,6 +2454,7 @@ static void
 show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 {
 	BrinSortStats  *stats = &sortstate->bs_stats;
+	BrinSort   *plan = (BrinSort *) sortstate->ss.ps.plan;
 
 	if (sortstate->bs_scan != NULL &&
 		sortstate->bs_scan->ranges != NULL)
@@ -2476,6 +2477,9 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 
 	if (stats->sort_count > 0)
 	{
+		ExplainPropertyInteger("Average Step", NULL, (int64)
+							   stats->watermark_updates_steps / stats->watermark_updates_count, es);
+
 		ExplainPropertyInteger("Ranges Processed", NULL, (int64)
 							   stats->range_count, es);
 
@@ -2517,6 +2521,8 @@ show_brinsort_stats(BrinSortState *sortstate, List *ancestors, ExplainState *es)
 			ExplainCloseGroup("Sorts", "Sorts", true, es);
 		}
 	}
+	else
+		ExplainPropertyInteger("Initial Step", NULL, (int64) plan->watermark_step, es);
 
 	if (stats->sort_count_in_memory > 0)
 	{
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index 15d1579773..f8356202b7 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -248,14 +248,6 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
-/*
- * How many distinct minval values to look forward for the next watermark?
- *
- * The smallest step we can do is 1, which means the immediately following
- * (while distinct) minval.
- */
-int brinsort_watermark_step = 1;
-
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -359,9 +351,11 @@ brinsort_end_tidscan(BrinSortState *node)
  * a separate "first" parameter - "set=false" has the same meaning.
  */
 static void
-brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
+brinsort_update_watermark(BrinSortState *node, bool first, bool asc)
 {
 	int		cmp;
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+	int			steps = plan->watermark_step;
 
 	/* assume we haven't found a watermark */
 	bool	found = false;
@@ -449,6 +443,9 @@ brinsort_update_watermark(BrinSortState *node, bool first, bool asc, int steps)
 
 	tuplesort_restorepos(node->bs_scan->ranges);
 
+	node->bs_stats.watermark_updates_count++;
+	node->bs_stats.watermark_updates_steps += plan->watermark_step;
+
 	node->bs_watermark_empty = (!found);
 }
 
@@ -960,7 +957,7 @@ IndexNext(BrinSortState *node)
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 
 					/* set the first watermark */
-					brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, true, asc);
 				}
 
 				break;
@@ -1081,7 +1078,7 @@ IndexNext(BrinSortState *node)
 				{
 					/* updte the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
-					brinsort_update_watermark(node, false, asc, brinsort_watermark_step);
+					brinsort_update_watermark(node, false, asc);
 				}
 
 				break;
@@ -1106,7 +1103,7 @@ IndexNext(BrinSortState *node)
 							{
 								brinsort_rescan(node);
 								node->bs_phase = BRINSORT_LOAD_RANGE;
-								brinsort_update_watermark(node, true, asc, brinsort_watermark_step);
+								brinsort_update_watermark(node, true, asc);
 							}
 							else
 								node->bs_phase = BRINSORT_FINISHED;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 4eea942dbf..6a3a38dd14 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -18,6 +18,7 @@
 
 #include <math.h>
 
+#include "access/brin.h"
 #include "access/genam.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
@@ -323,6 +324,14 @@ static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
 											 GatherMergePath *best_path);
 
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int brinsort_watermark_step = 0;
+
 /*
  * create_plan
  *	  Creates the access plan for a query by recursively processing the
@@ -3401,6 +3410,67 @@ create_brinsort_plan(PlannerInfo *root,
 
 	index_close(indexRel, NoLock);
 
+	/*
+	 * determine watermark step (how fast to advance)
+	 *
+	 * If the brinsort_watermark_step is set to a non-zero value, we just use
+	 * that value directly. Otherwise we pick a value using some simple
+	 * heuristics - we don't want the rows to exceed work_mem, and we leave
+	 * a bit slack (because we're adding batches of rows, not row by row).
+	 *
+	 * This has a weakness, because it assumes we incrementally add the same
+	 * number of rows into the "sort" set - but imagine very wide overlapping
+	 * ranges (e.g. random data on the same domain). Most of them will have
+	 * about the same minval, so the sort grows only very slowly. Until the
+	 * very last range, that removes the watermark and only then do most of
+	 * the rows get to the tuplesort.
+	 *
+	 * XXX But maybe we can look at the other statistics we have, like number
+	 * of overlaps and average range selectivity (% of tuples matching), and
+	 * deduce something from that?
+	 *
+	 * XXX Could we maybe adjust the watermark step adaptively at runtime?
+	 * That is, when we get to the "sort" step, maybe check how many rows
+	 * are there, and if there are only few then try increasing the step?
+	 */
+	brinsort_plan->watermark_step = brinsort_watermark_step;
+
+	if (brinsort_plan->watermark_step == 0)
+	{
+		BrinMinmaxStats *amstats;
+
+		/**/
+		Cardinality		rows = brinsort_plan->scan.plan.plan_rows;
+
+		/* estimate rowsize in the tuplesort */
+		int				width = brinsort_plan->scan.plan.plan_width;
+		int				tupwidth = (MAXALIGN(width) + MAXALIGN(SizeofHeapTupleHeader));
+
+		/* Don't overflow work_mem (use only half to absorb variations. */
+		int				maxrows = (work_mem * 1024L / tupwidth / 2);
+
+		/* If this is a LIMIT query, aim only for the required number of rows. */
+		if (root->limit_tuples > 0)
+			maxrows = Min(maxrows, root->limit_tuples);
+
+		/* Use the attnum calculated above. */
+		amstats = (BrinMinmaxStats *) get_attindexam(brinsort_plan->indexid,
+													 brinsort_plan->attnums[0]);
+
+		if (amstats)
+		{
+			double	pct_per_step = Max(amstats->minval_increment_avg,
+									   amstats->maxval_increment_avg);
+			double	rows_per_step = Max(1.0, pct_per_step * rows);
+
+			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
+		}
+
+		/* some rough safety estimates */
+		brinsort_plan->watermark_step = Max(brinsort_plan->watermark_step, 1);
+		brinsort_plan->watermark_step = Min(brinsort_plan->watermark_step, 8192);
+	}
+
 	return brinsort_plan;
 }
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1b798a7886..5e8d4e3852 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3587,7 +3587,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_NOT_IN_SAMPLE
 		},
 		&brinsort_watermark_step,
-		1, 1, INT_MAX,
+		0, 0, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index b635a845e1..ae69a99cb8 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1682,6 +1682,10 @@ typedef struct BrinSortStats
 	/* time to build ranges (milliseconds) */
 	int64	ranges_build_ms;
 
+	/* number/sum of watermark update steps */
+	int64	watermark_updates_steps;
+	int64	watermark_updates_count;
+
 } BrinSortStats;
 
 typedef struct BrinSortState
@@ -1714,6 +1718,7 @@ typedef struct BrinSortState
 	BrinRangeScanDesc *bs_scan;
 	BrinRange	   *bs_range;
 	ExprState	   *bs_qual;
+	int				bs_watermark_step;
 	Datum			bs_watermark;
 	bool			bs_watermark_set;
 	bool			bs_watermark_empty;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b4fa5faff5..9d784f6373 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -528,6 +528,9 @@ typedef struct BrinSort
 	/* NULLS FIRST/LAST directions */
 	bool	   *nullsFirst pg_node_attr(array_size(numCols));
 
+	/* number of watermark steps to make */
+	int			watermark_step;
+
 } BrinSort;
 
 /* ----------------
-- 
2.41.0

0008-wip-adaptive-watermark-step-20230710.patchtext/x-patch; charset=UTF-8; name=0008-wip-adaptive-watermark-step-20230710.patchDownload

From 0b22050597a40aaf79d9b6504c095d0d08ba5a93 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 22 Oct 2022 01:39:39 +0200
Subject: [PATCH 08/11] wip: adaptive watermark step

Another option it to adjust the watermark step based on past tuplesort
executions, and either increase or decrease the step, based on whether
the sort was in-memory or on-disk, etc.

To do this, set the GUC to -1:

  SET brinsort_watermark_step = -1;
---
 src/backend/access/brin/brin_minmax.c   |   7 +-
 src/backend/executor/nodeBrinSort.c     | 189 +++++++++++++++++++++++-
 src/backend/optimizer/plan/createplan.c |  21 +--
 src/backend/utils/misc/guc_tables.c     |   2 +-
 src/include/nodes/execnodes.h           |   2 +-
 src/include/nodes/plannodes.h           |   3 +-
 6 files changed, 206 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 162650cf73..6390f93a0d 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -47,9 +47,6 @@ static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
 											  Oid subtype, uint16 strategynum);
 
 
-/* print info about ranges */
-#define BRINSORT_DEBUG
-
 Datum
 brin_minmax_opcinfo(PG_FUNCTION_ARGS)
 {
@@ -1995,7 +1992,7 @@ brin_minmax_scan_add_tuple(BrinRangeScanDesc *scan, TupleTableSlot *slot,
 	scan->nranges++;
 }
 
-#ifdef BRINSORT_DEBUG
+#ifdef BRIN_SORT_DEBUG
 /*
  * brin_minmax_scan_next
  *		Return the next BRIN range information from the tuplestore.
@@ -2204,7 +2201,7 @@ brin_minmax_ranges(PG_FUNCTION_ARGS)
 	/* do the sort and any necessary post-processing */
 	brin_minmax_scan_finalize(brscan);
 
-#ifdef BRINSORT_DEBUG
+#ifdef BRIN_SORT_DEBUG
 	brin_minmax_scan_dump(brscan);
 #endif
 
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index f8356202b7..08507f2b5d 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -218,6 +218,8 @@
  *		ExecBrinSortReInitializeDSM reinitialize DSM for fresh scan
  *		ExecBrinSortInitializeWorker attach to DSM info in parallel worker
  */
+#include <math.h>
+
 #include "postgres.h"
 
 #include "access/brin.h"
@@ -248,6 +250,14 @@ static void ExecInitBrinSortRanges(BrinSort *node, BrinSortState *planstate);
 bool debug_brin_sort = false;
 #endif
 
+/*
+ * How many distinct minval values to look forward for the next watermark?
+ *
+ * The smallest step we can do is 1, which means the immediately following
+ * (while distinct) minval.
+ */
+int	brinsort_watermark_step = 0;
+
 /* do various consistency checks */
 static void
 AssertCheckRanges(BrinSortState *node)
@@ -859,6 +869,175 @@ brinsort_rescan(BrinSortState *node)
 	tuplesort_rescan(node->bs_scan->ranges);
 }
 
+/*
+ * Look at the tuplesort statistics, and maybe increase or decrease the
+ * watermark step. If the last sort was in-memory, we decrease the step.
+ * If the sort was in-memory, but we used less than work_mem/3, increment
+ * the step value.
+ *
+ * XXX This should probably behave differently for LIMIT queries, so that
+ * we don't load too many rows unnecessarily. We already consider that in
+ * create_brinsort_plan, but maybe we should limit increments to the step
+ * value here too - say, by tracking how many rows are we supposed to
+ * produce, and limiting the watermark so that we don't process too many
+ * rows in future steps.
+ *
+ * XXX We might also track the number of rows in the sort and space used,
+ * to calculate more accurate estimate of row width. And then use that to
+ * calculate number of rows that fit into work_mem. But the number of rows
+ * that go into tuplesort (per range) added would still remain fairly
+ * inaccurate, so not sure how good this woud be.
+ */
+static void
+brinsort_adjust_watermark_step(BrinSortState *node, TuplesortInstrumentation *stats)
+{
+	BrinSort   *plan = (BrinSort *) node->ss.ps.plan;
+
+	if (brinsort_watermark_step != -1)
+		return;
+
+	if (stats->spaceType == SORT_SPACE_TYPE_DISK)
+	{
+		/*
+		 * We don't know how much to decrease the step (hard to estimate
+		 * due to space needed for in-memory and on-disk sorts is not
+		 * easily comparable, so we just cut the step in half. For the
+		 * in-memory sort, we then can do better estimate and increase
+		 * the step more accurately.
+		 */
+		plan->watermark_step = Max(1, plan->watermark_step / 2);
+	}
+	else
+	{
+		/*
+		 * Adjust the step based on the last sort - we shoot for 2/3 of
+		 * work_mem, to keep some slack (and not switch to on-disk sort
+		 * due to minor differences). We calculate the average row width
+		 * using space used and number of rows in the tuplesort, number
+		 * of rows we could fit into work_mem, and how many steps would
+		 * that mean (assuming number of rows is proportional to the
+		 * number of steps).
+		 *
+		 * We need to be careful about the number of rows we're supposed
+		 * to produce (and how many we already produced). Consider for
+		 * example a query with LIMIT 1000, and that we produce 999 rows
+		 * in the first sort, so that we need only 1 more row. It would
+		 * be silly to pick the steps with the goal to "fill work_mem"
+		 * instead of just enough to produce the one row.
+		 *
+		 * XXX In principle, we don't know how many rows will need to be
+		 * read from the table - there may be interesting rows already in
+		 * the tuplestore (in which case we could do a smaller step). But
+		 * we don't know how many such rows are there - maybe if we had
+		 * multiple smaller tuplestores, which would also reduce the
+		 * amount of "respill" we need to do.
+		 */
+		int		nrows_remaining;
+		int		step = plan->watermark_step;
+		int		step_max = plan->watermark_step * 2;
+
+		/* number of remaining rows we're expected to produce */
+		nrows_remaining = Max(1.0, plan->step_maxrows - node->bs_stats.ntuples_tuplesort_all);
+
+		/*
+		 * If we sorted any rows, calculate how many similar rows we can fit
+		 * into work_mem. We restrict ourselves to 2/3 of work_mem, to leave
+		 * a bit of slack space.
+		 *
+		 * XXX Hopefully the average width is somewhat accurate, but maybe
+		 * we should remember the width we originally expected, and combine
+		 * that somehow. Maybe we should not use just the last tuplesort,
+		 * but instead accumulate average from all preceding sorts and
+		 * combine them somehow (say, using weighted average with older
+		 * values having less influence).
+		 */
+		if (node->bs_stats.ntuples_tuplesort > 0)
+		{
+			int		nrows_wmem;
+			int		avgwidth;
+
+			/* average tuple width, calculated from last sort */
+			avgwidth = (stats->spaceUsed * 1024L / node->bs_stats.ntuples_tuplesort);
+
+			/*
+			 * Calculate the numer of rows to fit into 2/3 of work_mem, but
+			 * cap to the number of rows we're expected to produce.
+			 */
+			nrows_wmem = Min(nrows_remaining, (2 * 1024L * work_mem / 3) / avgwidth);
+
+			/* scale the number of steps to produce the number of rows */
+			step = step * ((double) (nrows_wmem * avgwidth) / (stats->spaceUsed * 1024L));
+
+			/* remember this as the max, so that we don't overflow work_mem */
+			step_max = Min(step, step_max);
+
+			/* however, make sure we don't grow too fast - cap to 2x */
+			step = Min(step, step_max);
+		}
+
+		/*
+		 * Now calculate average step size using data from all sorts we did
+		 * up to now. Then we calculate the number of steps we expect to be
+		 * necessary.
+		 *
+		 * If we had calculated average number of rows per step from AM stats,
+		 * consider that too. It's possible the batch had just one row, which
+		 * might result in very high estimate of steps - it'd be silly to
+		 * jump e.g. from 1 to 1000 based on this unreliable statistics. To
+		 * prevent that, we combine the two rows_per_step sources as weighted
+		 * sum, using the observed vs. target number of rows as weight. The
+		 * closer we're to the target, the more reliable value from past
+		 * executions is.
+		 *
+		 * But we don't want to overflow work_mem, so cap by step_max.
+		 */
+		if (node->bs_stats.ntuples_tuplesort_all > 0)
+		{
+			double		rows_per_step;
+
+			/* average number of rows we produced per step so far */
+			rows_per_step = (double) node->bs_stats.ntuples_tuplesort_all / node->bs_stats.watermark_updates_steps;
+
+			/*
+			 * If we have AM stats with average number of rows per step, consider
+			 * that too - approximate depending on what fraction of rows we already
+			 * produced (with higher fraction of rows produced we prefer the local
+			 * average, as opposed to the global average from index AM stats).
+			 */
+			if (plan->rows_per_step > 0)
+			{
+				/* number of rows we already produced (as a fraction) */
+				double weight = (double) node->bs_stats.ntuples_tuplesort_all / plan->step_maxrows;
+
+				/* paranoia */
+				weight = Min(1.0, weight);
+
+				/*
+				 * Approximate between index AM and "local" average calculated
+				 * from past executions. The closer we get to target rows, the
+				 * more we ignore the index AM stats.
+				 */
+				rows_per_step = weight * rows_per_step + (1 - weight) * plan->rows_per_step;
+			}
+
+			/* approximate the steps between */
+			step = Max(step, ceil((double) nrows_remaining / rows_per_step));
+
+			/*
+			 * But don't overflow the current max (which is set either
+			 * as 2x starting value, or from work_mem.
+			 */
+			step = Min(step, step_max);
+		}
+
+		plan->watermark_step = step;
+
+	}
+
+	plan->watermark_step = Max(1, plan->watermark_step);
+	plan->watermark_step = Min(8192, plan->watermark_step);
+}
+
 /* ----------------------------------------------------------------
  *		IndexNext
  *
@@ -997,13 +1176,21 @@ IndexNext(BrinSortState *node)
 
 						/*
 						 * Reset tuplesort statistics between runs, otherwise
-						 * we'll keep re-using stats from the largest run.
+						 * we'll keep re-using stats from the largest run, which
+						 * would then confuse the adaptive adjustment of the
+						 * watermark step.
 						 */
 						tuplesort_reset_stats(node->bs_tuplesortstate);
 
 						tuplesort_performsort(node->bs_tuplesortstate);
 
 						node->bs_stats.sort_count++;
+
+						memset(&stats, 0, sizeof(TuplesortInstrumentation));
+						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
+
+						brinsort_adjust_watermark_step(node, &stats);
+
 						node->bs_stats.ntuples_tuplesort = 0;
 
 						tuplesort_get_stats(node->bs_tuplesortstate, &stats);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 6a3a38dd14..3e406d04eb 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -324,13 +324,8 @@ static GatherMerge *create_gather_merge_plan(PlannerInfo *root,
 											 GatherMergePath *best_path);
 
 
-/*
- * How many distinct minval values to look forward for the next watermark?
- *
- * The smallest step we can do is 1, which means the immediately following
- * (while distinct) minval.
- */
-int brinsort_watermark_step = 0;
+/* defined in nodeBrinSort.c */
+extern int brinsort_watermark_step;
 
 /*
  * create_plan
@@ -3434,8 +3429,14 @@ create_brinsort_plan(PlannerInfo *root,
 	 * are there, and if there are only few then try increasing the step?
 	 */
 	brinsort_plan->watermark_step = brinsort_watermark_step;
+	brinsort_plan->rows_per_step = -1;
 
-	if (brinsort_plan->watermark_step == 0)
+	if (root->limit_tuples > 0)
+		brinsort_plan->step_maxrows = root->limit_tuples;
+	else
+		brinsort_plan->step_maxrows = brinsort_plan->scan.plan.plan_rows;
+
+	if (brinsort_plan->watermark_step <= 0)
 	{
 		BrinMinmaxStats *amstats;
 
@@ -3463,7 +3464,9 @@ create_brinsort_plan(PlannerInfo *root,
 									   amstats->maxval_increment_avg);
 			double	rows_per_step = Max(1.0, pct_per_step * rows);
 
-			brinsort_plan->watermark_step = (int) (maxrows / rows_per_step);
+			brinsort_plan->rows_per_step = rows_per_step;
+
+			brinsort_plan->watermark_step = (int) ceil(maxrows / rows_per_step);
 		}
 
 		/* some rough safety estimates */
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 5e8d4e3852..43e73995e2 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3587,7 +3587,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_NOT_IN_SAMPLE
 		},
 		&brinsort_watermark_step,
-		0, 0, INT_MAX,
+		0, -1, INT_MAX,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index ae69a99cb8..21d75f749d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1731,7 +1731,7 @@ typedef struct BrinSortState
 	 * We need two tuplesort instances - one for current range, one for
 	 * spill-over tuples from the overlapping ranges
 	 */
-	void		   *bs_tuplesortstate;
+	Tuplesortstate  *bs_tuplesortstate;
 	Tuplestorestate *bs_tuplestore;
 } BrinSortState;
 
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 9d784f6373..0600b1f379 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -530,7 +530,8 @@ typedef struct BrinSort
 
 	/* number of watermark steps to make */
 	int			watermark_step;
-
+	int			step_maxrows;
+	int			rows_per_step;
 } BrinSort;
 
 /* ----------------
-- 
2.41.0

0009-wip-add-brinsort-regression-tests-20230710.patchtext/x-patch; charset=UTF-8; name=0009-wip-add-brinsort-regression-tests-20230710.patchDownload

From 6588978e12a349fcfdafd7745af01a7908ccbe01 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Fri, 3 Feb 2023 13:19:24 +0100
Subject: [PATCH 09/11] wip: add brinsort regression tests

---
 src/test/regress/expected/brin_sort.out       | 536 ++++++++++++
 src/test/regress/expected/brin_sort_exprs.out | 776 ++++++++++++++++++
 src/test/regress/expected/brin_sort_multi.out | 533 ++++++++++++
 .../expected/brin_sort_multi_exprs.out        | 772 +++++++++++++++++
 src/test/regress/parallel_schedule            |   6 +
 src/test/regress/sql/brin_sort.sql            | 238 ++++++
 src/test/regress/sql/brin_sort_exprs.sql      | 373 +++++++++
 src/test/regress/sql/brin_sort_multi.sql      | 235 ++++++
 .../regress/sql/brin_sort_multi_exprs.sql     | 369 +++++++++
 9 files changed, 3838 insertions(+)
 create mode 100644 src/test/regress/expected/brin_sort.out
 create mode 100644 src/test/regress/expected/brin_sort_exprs.out
 create mode 100644 src/test/regress/expected/brin_sort_multi.out
 create mode 100644 src/test/regress/expected/brin_sort_multi_exprs.out
 create mode 100644 src/test/regress/sql/brin_sort.sql
 create mode 100644 src/test/regress/sql/brin_sort_exprs.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi_exprs.sql

diff --git a/src/test/regress/expected/brin_sort.out b/src/test/regress/expected/brin_sort.out
new file mode 100644
index 0000000000..350a101a5b
--- /dev/null
+++ b/src/test/regress/expected/brin_sort.out
@@ -0,0 +1,536 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+--
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_exprs.out b/src/test/regress/expected/brin_sort_exprs.out
new file mode 100644
index 0000000000..9044b4991a
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_exprs.out
@@ -0,0 +1,776 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+--
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi.out b/src/test/regress/expected/brin_sort_multi.out
new file mode 100644
index 0000000000..512511f712
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi.out
@@ -0,0 +1,533 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+--
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi_exprs.out b/src/test/regress/expected/brin_sort_multi_exprs.out
new file mode 100644
index 0000000000..7fade4e7c4
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi_exprs.out
@@ -0,0 +1,772 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 31 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index cf46fa3359..15ead24d4d 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -131,3 +131,9 @@ test: fast_default
 # run tablespace test at the end because it drops the tablespace created during
 # setup that other tests may use.
 test: tablespace
+
+# try sorting using BRIN index
+test: brin_sort
+test: brin_sort_multi
+test: brin_sort_exprs
+test: brin_sort_multi_exprs
diff --git a/src/test/regress/sql/brin_sort.sql b/src/test/regress/sql/brin_sort.sql
new file mode 100644
index 0000000000..b38bd42457
--- /dev/null
+++ b/src/test/regress/sql/brin_sort.sql
@@ -0,0 +1,238 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+
+--
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_exprs.sql b/src/test/regress/sql/brin_sort_exprs.sql
new file mode 100644
index 0000000000..c2dccf6618
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_exprs.sql
@@ -0,0 +1,373 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+--
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi.sql b/src/test/regress/sql/brin_sort_multi.sql
new file mode 100644
index 0000000000..e5d96877d3
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi.sql
@@ -0,0 +1,235 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+
+--
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi_exprs.sql b/src/test/regress/sql/brin_sort_multi_exprs.sql
new file mode 100644
index 0000000000..3ef7c8acee
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi_exprs.sql
@@ -0,0 +1,369 @@
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+
+
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
-- 
2.41.0

0010-wip-add-brinsort-amstats-regression-tests-20230710.patchtext/x-patch; charset=UTF-8; name=0010-wip-add-brinsort-amstats-regression-tests-20230710.patchDownload

From 323edd19b673f0ac7cf476d1ebf881b67272a7d2 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sat, 18 Feb 2023 16:41:33 +0100
Subject: [PATCH 10/11] wip: add brinsort amstats regression tests

---
 .../regress/expected/brin_sort_amstats.out    | 537 ++++++++++++
 .../expected/brin_sort_exprs_amstats.out      | 762 ++++++++++++++++++
 .../expected/brin_sort_multi_amstats.out      | 534 ++++++++++++
 .../brin_sort_multi_exprs_amstats.out         | 758 +++++++++++++++++
 src/test/regress/parallel_schedule            |   6 +
 src/test/regress/sql/brin_sort_amstats.sql    | 240 ++++++
 .../regress/sql/brin_sort_exprs_amstats.sql   | 270 +++++++
 .../regress/sql/brin_sort_multi_amstats.sql   | 237 ++++++
 .../sql/brin_sort_multi_exprs_amstats.sql     | 265 ++++++
 9 files changed, 3609 insertions(+)
 create mode 100644 src/test/regress/expected/brin_sort_amstats.out
 create mode 100644 src/test/regress/expected/brin_sort_exprs_amstats.out
 create mode 100644 src/test/regress/expected/brin_sort_multi_amstats.out
 create mode 100644 src/test/regress/expected/brin_sort_multi_exprs_amstats.out
 create mode 100644 src/test/regress/sql/brin_sort_amstats.sql
 create mode 100644 src/test/regress/sql/brin_sort_exprs_amstats.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi_amstats.sql
 create mode 100644 src/test/regress/sql/brin_sort_multi_exprs_amstats.sql

diff --git a/src/test/regress/expected/brin_sort_amstats.out b/src/test/regress/expected/brin_sort_amstats.out
new file mode 100644
index 0000000000..c3cba20502
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_amstats.out
@@ -0,0 +1,537 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+--
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_exprs_amstats.out b/src/test/regress/expected/brin_sort_exprs_amstats.out
new file mode 100644
index 0000000000..b038f7ce01
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_exprs_amstats.out
@@ -0,0 +1,762 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+    LOOP
+        FETCH v_curs INTO v_row;
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+    CLOSE v_curs;
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+    LOOP
+        FETCH v_curs INTO v_row;
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+    CLOSE v_curs;
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+--
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi_amstats.out b/src/test/regress/expected/brin_sort_multi_amstats.out
new file mode 100644
index 0000000000..4e71f21cf3
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi_amstats.out
@@ -0,0 +1,534 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+--
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+drop table brin_sort_test;
diff --git a/src/test/regress/expected/brin_sort_multi_exprs_amstats.out b/src/test/regress/expected/brin_sort_multi_exprs_amstats.out
new file mode 100644
index 0000000000..2b9adfce62
--- /dev/null
+++ b/src/test/regress/expected/brin_sort_multi_exprs_amstats.out
@@ -0,0 +1,758 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+    LOOP
+        FETCH v_curs INTO v_row;
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+    CLOSE v_curs;
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+    LOOP
+        FETCH v_curs INTO v_row;
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+    CLOSE v_curs;
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+ brinsort_check_ordering 
+-------------------------
+ 
+(1 row)
+
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+ERROR:  BRIN Sort: not found
+CONTEXT:  PL/pgSQL function brinsort_check_ordering(text,integer,boolean) line 24 at RAISE
+drop table brin_sort_test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 15ead24d4d..9b36202518 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -137,3 +137,9 @@ test: brin_sort
 test: brin_sort_multi
 test: brin_sort_exprs
 test: brin_sort_multi_exprs
+
+# try sorting using BRIN index with indexam stats
+test: brin_sort_amstats
+test: brin_sort_multi_amstats
+test: brin_sort_exprs_amstats
+test: brin_sort_multi_exprs_amstats
diff --git a/src/test/regress/sql/brin_sort_amstats.sql b/src/test/regress/sql/brin_sort_amstats.sql
new file mode 100644
index 0000000000..c3da189675
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_amstats.sql
@@ -0,0 +1,240 @@
+set enable_indexam_stats = true;
+
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin (int_val) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin (bigint_val) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (text_val) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin (inet_val inet_minmax_ops) with (pages_per_range=1);
+
+--
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_exprs_amstats.sql b/src/test/regress/sql/brin_sort_exprs_amstats.sql
new file mode 100644
index 0000000000..02a9f23f80
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_exprs_amstats.sql
@@ -0,0 +1,270 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+    LOOP
+        FETCH v_curs INTO v_row;
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+    CLOSE v_curs;
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+    LOOP
+        FETCH v_curs INTO v_row;
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+    CLOSE v_curs;
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_bigint_idx on brin_sort_test using brin ((bigint_val + 1)) with (pages_per_range=1);
+create index brin_sort_test_text_idx on brin_sort_test using brin (('x' || text_val)) with (pages_per_range=1);
+create index brin_sort_test_inet_idx on brin_sort_test using brin ((inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+--
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi_amstats.sql b/src/test/regress/sql/brin_sort_multi_amstats.sql
new file mode 100644
index 0000000000..1bd8c7802e
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi_amstats.sql
@@ -0,0 +1,237 @@
+set enable_indexam_stats = true;
+
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+
+    LOOP
+        FETCH v_curs INTO v_row;
+
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+
+    CLOSE v_curs;
+
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+
+end;
+$$ language plpgsql;
+
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+
+-- create brin indexes on individual columns
+create index brin_sort_test_multi_idx on brin_sort_test using brin (int_val, bigint_val, text_val, inet_val inet_minmax_ops) with (pages_per_range=1);
+
+--
+vacuum analyze brin_sort_test;
+
+set enable_seqscan = off;
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+
+reindex table brin_sort_test;
+
+vacuum analyze brin_sort_test;
+
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val', 1000, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc', 1000, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select int_val as val from brin_sort_test order by int_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val', 1000, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc', 1000, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select bigint_val as val from brin_sort_test order by bigint_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val', 1000, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc', 1000, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select inet_val as val from brin_sort_test order by inet_val desc limit 100 offset 100', 100, true);
+
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val', 1000, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc', 1000, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100', 100, true);
+
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select text_val as val from brin_sort_test order by text_val desc limit 100 offset 100', 100, true);
+
+
+drop table brin_sort_test;
diff --git a/src/test/regress/sql/brin_sort_multi_exprs_amstats.sql b/src/test/regress/sql/brin_sort_multi_exprs_amstats.sql
new file mode 100644
index 0000000000..d91666642c
--- /dev/null
+++ b/src/test/regress/sql/brin_sort_multi_exprs_amstats.sql
@@ -0,0 +1,265 @@
+set enable_indexam_stats = true;
+-- function to verify various sort-related data (total rows, ordering)
+create or replace function brinsort_check_ordering(p_sql text, p_rows_expected int, p_desc boolean) returns void as $$
+declare
+    v_curs refcursor;
+    v_row record;
+    v_prev record;
+    v_brin_sort_found bool := false;
+    v_count int := 0;
+begin
+    -- needed because the p_sql query has different data types
+    execute 'discard plans';
+    OPEN v_curs NO SCROLL FOR EXECUTE format('explain %s', p_sql);
+    LOOP
+        FETCH v_curs INTO v_row;
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+        IF v_row::text LIKE '%BRIN Sort%' THEN
+            v_brin_sort_found := true;
+            EXIT;
+        END IF;
+    END LOOP;
+    CLOSE v_curs;
+    IF NOT v_brin_sort_found THEN
+        RAISE EXCEPTION 'BRIN Sort: not found';
+    END IF;
+    OPEN v_curs NO SCROLL FOR EXECUTE format(p_sql);
+    LOOP
+        FETCH v_curs INTO v_row;
+        IF NOT FOUND THEN
+            EXIT;
+        END IF;
+        IF v_prev IS NOT NULL THEN
+            IF v_prev.val > v_row.val AND NOT p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % > % (asc)', v_prev.val, v_row.val;
+            END IF;
+            IF v_prev.val < v_row.val AND p_desc THEN
+                RAISE EXCEPTION 'ordering mismatch % < % (desc)', v_prev.val, v_row.val;
+            END IF;
+        END IF;
+        v_prev := v_row;
+        v_count := v_count + 1;
+    END LOOP;
+    CLOSE v_curs;
+    IF v_count != p_rows_expected THEN
+        RAISE EXCEPTION 'count mismatch: % != %', v_count, p_rows_expected;
+    END IF;
+end;
+$$ language plpgsql;
+create table brin_sort_test (int_val int, bigint_val bigint, text_val text, inet_val inet) with (fillfactor=10);
+-- sequential values
+insert into brin_sort_test
+select
+	i,
+	-i,	-- same as int, but at least opposite
+	lpad(i::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + i
+from generate_series(1,1000) s(i);
+-- create brin indexes on individual columns
+create index brin_sort_test_int_idx on brin_sort_test using brin ((int_val + 1), (bigint_val + 1), ('x' || text_val), (inet_val + 1) inet_minmax_ops) with (pages_per_range=1);
+vacuum analyze brin_sort_test;
+set enable_seqscan = off;
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+-- semi-random data (sequential + randomness)
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	i + (100 * random())::int,
+	-(i + (100 * random())::int),	-- same as int, but at least opposite
+	lpad((i + (100 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (i + 100 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+-- random data
+truncate table brin_sort_test;
+insert into brin_sort_test
+select
+	(1000 * random())::int,
+	-((1000 * random())::int),	-- same as int, but at least opposite
+	lpad(((1000 * random())::int)::text || md5(i::text), 40, '0'),
+	'10.0.0.0'::inet + (1000 * random()::int)
+from generate_series(1,1000) s(i);
+reindex table brin_sort_test;
+vacuum analyze brin_sort_test;
+
+-- matching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1)', 1000, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (int_val + 1) as val from brin_sort_test order by (int_val - 1) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1)', 1000, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (bigint_val + 1) as val from brin_sort_test order by (bigint_val - 1) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc', 1000, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100', 100, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''x'' || text_val) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val)', 1000, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc', 1000, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100', 100, true);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (''x'' || text_val) as val from brin_sort_test order by (''y'' || text_val) desc limit 100 offset 100', 100, true);
+
+-- matching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc', 1000, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val + 1) desc limit 100 offset 100', 100, true);
+-- mismatching expression
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1)', 1000, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc', 1000, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100', 100, true);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) limit 100 offset 100', 100, false);
+select brinsort_check_ordering('select (inet_val + 1) as val from brin_sort_test order by (inet_val - 1) desc limit 100 offset 100', 100, true);
+drop table brin_sort_test;
-- 
2.41.0

0011-wip-test-generator-script-20230710.patchtext/x-patch; charset=UTF-8; name=0011-wip-test-generator-script-20230710.patchDownload

From 48b4e1a8709cbb51e3e0cb4f5442df773080af49 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Mon, 6 Feb 2023 03:42:52 +0100
Subject: [PATCH 11/11] wip: test generator script

---
 brin-test.py | 386 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 386 insertions(+)
 create mode 100644 brin-test.py

diff --git a/brin-test.py b/brin-test.py
new file mode 100644
index 0000000000..c90ec798be
--- /dev/null
+++ b/brin-test.py
@@ -0,0 +1,386 @@
+import psycopg2
+import psycopg2.extras
+import random
+import sys
+import time
+import re
+
+from datetime import datetime
+from statistics import mean
+
+cols = [('int_val', 'int4_minmax_ops'),
+		('bigint_val', 'int8_minmax_ops'),
+		('text_val', 'text_minmax_ops'),
+		('inet_val', 'inet_minmax_ops'),
+		('(int_val+1)', 'int4_minmax_ops'),
+		('(bigint_val+1)', 'int8_minmax_ops'),
+		("('x' || text_val)", 'text_minmax_ops'),
+		('(inet_val + 1)', 'inet_minmax_ops'),
+		('(int_val+2)', 'int4_minmax_ops'),
+		('(bigint_val+2)', 'int8_minmax_ops'),
+		("('y' || text_val)", 'text_minmax_ops'),
+		('(inet_val + 2)', 'inet_minmax_ops')]
+
+# randomly reorder the table columns
+#table_cols = [('int_val int', 'i', 'i + %(skew)d * random()', 'i + 1000000 * random()'),
+#			  ('bigint_val bigint', '-i', '-i - 100 * random()', '-1 - 1000000 * random()'),
+#			  ('inet_val inet', "'10.0.0.0'::inet + i", "'10.0.0.0'::inet + i * 100 * random()::int", "'10.0.0.0'::inet + i + 1000000 * random()::int"),
+#			  ('text_val text', "lpad(i::text || md5(i::text), 40, '0')", "lpad((i + 100*random()::int)::text || md5(i::text), 40, '0')", "lpad((i + 1000000*random()::int)::text || md5(i::text), 40, '0')")]
+
+table_cols = [('int_val int', 'i + %(randomness)d * random()'),
+			  ('bigint_val bigint', '-i - %(randomness)d * random()'),
+			  ('inet_val inet', "'10.0.0.0'::inet + i + %(randomness)d * random()::int"),
+			  ('text_val text', "lpad((i + %(randomness)d * random()::int)::text || md5(i::text), 40, '0')")]
+
+
+def execute_query(cur, query, fetch_result = False):
+
+	cur.execute(query)
+
+	if fetch_result:
+		return cur.fetchall()
+
+
+# recreate the table with the columns in randomized order
+def recreate_table(conn, nrows, randomness, fillfactor):
+
+	random.shuffle(table_cols)
+
+	cur = conn.cursor()
+
+	execute_query(cur, 'BEGIN')
+
+	execute_query(cur, 'DROP TABLE IF EXISTS test_table')
+
+	execute_query(cur, 'CREATE TABLE test_table (%s) with (fillfactor=%d)' % (', '.join([v[0] for v in table_cols]), fillfactor))
+	print('CREATE TABLE test_table (%s) with (fillfactor=%d)' % (', '.join([v[0] for v in table_cols]), fillfactor))
+
+	insert_sql = 'INSERT INTO test_table SELECT %s FROM generate_series(1,%d) s(i)' % (', '.join([v[1] for v in table_cols]), nrows)
+	insert_sql = insert_sql % {'randomness' : int(nrows * randomness), 'rows' : nrows}
+
+	print(insert_sql)
+
+	execute_query(cur, insert_sql)
+
+	execute_query(cur, 'COMMIT')
+
+	cur.close()
+
+
+def create_indexes(conn, pages_per_range):
+
+	cur = conn.cursor()
+
+	num_indexes = random.randint(1,len(cols))
+
+	# randomly pick columns to index
+	indexed = random.sample(cols, num_indexes)
+
+	for c in indexed:
+		# f = random.random()
+		# num_pages = 1 + int(f * f * f * 256)
+		index_sql = 'CREATE INDEX ON test_table USING brin (%s %s) WITH (pages_per_range=%d)' % (c[0], c[1], pages_per_range)
+		print(index_sql)
+		execute_query(cur, index_sql)
+
+	cur.close()
+
+	return indexed
+
+
+def brinsort_in_explain(cur, query):
+
+	cur.execute('explain ' + query)
+	for r in cur.fetchall():
+		if 'BRIN Sort' in r['QUERY PLAN']:
+			return True
+
+	return False
+
+
+def compare_default(a, b):
+	if a < b:
+		return -1
+	elif a > b:
+		return 1
+	return 0
+
+
+def compare_inet(a, b):
+	a = [int(v) for v in a.split('.')]
+	b = [int(v) for v in b.split('.')]
+
+	for p in range(0,4):
+		r = compare_default(a[p], b[p])
+		if r != 0:
+			return r
+
+	return r
+
+
+def check_ordering(conn, config, query, expected_rows, select_star, select_list, sort_list, is_desc):
+
+	cur = conn.cursor()
+
+	data = execute_query(cur, query, True)
+
+	if len(data) != expected_rows:
+		print('ERROR: unexpected number of rows %s %s' % (expected_rows, len(data)))
+		sys.exit(1)
+
+	# what prefix we can check ordering for (some sort columns may not be
+	# included in the result, and we need a continuous prefix)
+	prefix = []
+	indexes = []
+	sort_order = {}
+	for s in sort_list:
+		if select_star:
+			if s[0] not in [x for x in table_cols]:
+				break
+
+			idx = [x for x in table_cols].index(s[0])
+		else:
+			if s not in select_list:
+				break
+
+			idx = select_list.index(s)
+
+		if idx is None:
+			break
+
+		sort_idx = sort_list.index(s)
+
+		prefix.append(s)
+		indexes.append(idx)
+		# sort_order.update({select_list.index(s) : is_desc[sort_list.index(s)]})
+		sort_order.update({idx : is_desc[sort_idx]})
+
+	# print("PREFIX", indexes, prefix)
+
+	if len(prefix) != 0:
+		prev = None
+		for row in data:
+			if prev is not None:
+
+				for idx in indexes:
+
+					if select_list[idx][1] == 'inet_minmax_ops':
+						r = compare_inet(prev[idx], row[idx])
+					else:
+						r = compare_default(prev[idx], row[idx])
+
+					if sort_order[idx]:
+						r = -r
+
+					if r > 0:
+						print("ERROR: incorrect ordering %s > %s" % (prev[idx], row[idx]))
+						sys.exit(1)
+
+					if r < 0:
+						break
+
+			prev = row
+
+	cur.close()
+
+
+def run_queries(conn, config, indexed_cols, num_queries = 1000):
+
+	nquery = 0
+
+	while nquery < num_queries:
+		if run_query(conn, config, indexed_cols):
+			nquery += 1
+
+
+def query_timing(cur, query):
+
+	runs = []
+
+	# get explain plan and costs from the first node
+	r = execute_query(cur, 'explain (analyze, timing off) %s' % (query,), True)
+	print("")
+	print("\n".join(['    ' + x[0] for x in r]))
+	print("")
+
+	sys.stdout.flush()
+
+	r = re.search('cost=([^\s]*)\.\.([^\s]*)', r[0][0])
+	costs = [float(r.groups()[0]), float(r.groups()[1])]
+
+	for r in range(0,1):
+		s = time.time()
+		execute_query(cur, query)
+		d = time.time()
+		runs.append(d-s)
+
+	# print("runs %s => mean %s" % (str(runs), mean(runs)))
+
+	sys.stdout.flush()
+
+	return (mean(runs), costs)
+
+
+def check_timing(conn, config, query):
+
+	cur = conn.cursor()
+
+	# get timing for a simple plan without a BRIN sort
+
+	execute_query(cur, 'set enable_seqscan = on')
+	execute_query(cur, 'set enable_brinsort = off')
+
+	(seqscan_time, seqscan_costs) = query_timing(cur, query)
+
+	# get timing for a simple plan with a BRIN sort
+	execute_query(cur, 'set enable_seqscan = off')
+	execute_query(cur, 'set enable_brinsort = on')
+	execute_query(cur, query)
+
+	(brinsort_time, brinsort_costs) = query_timing(cur, query)
+
+	print ("timing", 'rows', config['nrows'], 'pages_per_range', config['pages_per_range'], 'randomness', config['randomness'], 'fillfactor', config['fillfactor'], 'work_mem', config['work_mem'], 'watermark_step', config['watermark_step'], 'limit', config['limit'], 'offset', config['offset'], "seqscan", seqscan_time, "brinsort", brinsort_time, "costs seqscan", seqscan_costs[0], seqscan_costs[1], "brinsort", brinsort_costs[0], brinsort_costs[1])
+	# print ("brinsort timing", brinsort_time, "costs", brinsort_costs[0], brinsort_costs[1])
+
+	if (seqscan_costs[1] * 1.1 < brinsort_costs[1]) and (seqscan_time > brinsort_time * 1.1):
+		print ("COSTING ISSUE (%f < %f) && (%f > %f)" % (seqscan_costs[1], brinsort_costs[1], seqscan_time, brinsort_time))
+
+	if (seqscan_costs[1] > brinsort_costs[1] * 1.1) and (seqscan_time * 1.1 < brinsort_time):
+		print ("COSTING ISSUE (%f > %f) && (%f < %f)" % (seqscan_costs[1], brinsort_costs[1], seqscan_time, brinsort_time))
+
+	sys.stdout.flush()
+
+
+def run_query(conn, config, indexed_cols):
+
+	limit_rows = config['nrows']
+	offset_rows = 0
+
+	cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
+
+	# random columns to reference in the SELECT list, may not include sort column(s)
+	select_list = random.sample(cols, random.randint(1,len(cols)))
+
+	# but maybe just do select *, so that we don't do a projection
+	select_star = False
+	if random.random() < 0.5:
+		select_star = True
+		select_list = [('*', None)]
+
+	# random columns to reference in the ORDER BY clause
+	sort_list = random.sample(cols, random.randint(1,len(cols)))
+
+	# generate random ASC / DESC modifiers
+	is_desc = []
+	order_by = []
+	for s in range(0,len(sort_list)):
+		desc = random.choice([True, False])
+		is_desc.append(desc)
+		x = sort_list[s][0]
+		if desc:
+			x = x + ' DESC'
+		order_by.append(x)
+
+	query = 'SELECT %s FROM test_table ORDER BY %s' % (', '.join([v[0] for v in select_list]), ', '.join(order_by))
+
+	# randomly add LIMIT and OFFSET clause(s)
+	if random.random() < 0.5:
+
+		limit_rows = 1 + int(pow(random.random(), 3) * random.randint(1,config['nrows']))
+		query = query + ' LIMIT %d' % (limit_rows,);
+
+		if limit_rows < config['nrows'] and random.random() < 0.5:
+
+			offset_rows = int(pow(random.random(), 3) * random.randint(1,config['nrows'] - limit_rows))
+			query = query + ' OFFSET %d' % (offset_rows,);
+
+	expected_rows = min(limit_rows, config['nrows'] - offset_rows)
+
+	# watermark_step = random.randint(-1, 3)
+	watermark_step = random.choice([-1, 0, 1, 8, 32, 128])
+	execute_query(cur, 'SET brinsort_watermark_step = %d' % (watermark_step,))
+
+	f = random.random()
+	#work_mem_kb = 64 + int((f * f * f) * random.randint(64, 32768))
+	work_mem_kb = random.choice([64, 1024, 4096, 32768])
+
+	execute_query(cur, "SET work_mem = '%dkB'" % (work_mem_kb,))
+
+	config = config.copy()
+	config.update({'work_mem' : work_mem_kb})
+	config.update({'watermark_step' : watermark_step})
+	config.update({'limit' : limit_rows})
+	config.update({'offset' : offset_rows})
+
+	# do we expect brinsort or not? only when the first ORDER BY is indexed
+	if sort_list[0] in indexed_cols:
+
+		print('--------------', datetime.now(), '--------------')
+		print("SQL:", query)
+		print("CONFIG:", config)
+
+		if brinsort_in_explain(cur, query):
+			check_ordering(conn, config, query, expected_rows, select_star, select_list, sort_list, is_desc)
+			check_timing(conn, config, query)
+		else:
+			print("ERROR: BRIN Sort not in plan")
+			sys.exit(1)
+
+		result = True
+
+	else:
+
+		if brinsort_in_explain(cur, query):
+			print("ERROR: BRIN Sort in plan")
+			sys.exit(1)
+
+		result = False
+
+	cur.close()
+
+	return result
+
+
+def setup_connection(conn):
+	cur = conn.cursor()
+
+	# force index access
+	execute_query(cur, 'SET enable_seqscan = off')
+	execute_query(cur, 'SET max_parallel_workers_per_gather = 0')
+
+	cur.close()
+
+
+run_id = 0
+
+while True:
+
+	run_id += 1
+
+	config = {}
+
+	conn = psycopg2.connect('host=localhost port=5432 dbname=test user=user')
+
+	setup_connection(conn)
+
+	print('========== run %d ==========' % (run_id,))
+
+	# data distribution (1 - sequential, 3 - random)
+	config['randomness'] = random.choice([0, 0.05, 0.1, 0.25, 0.5, 1.0])
+
+	# random fillfactor, skewed closer to 10%
+	config['fillfactor'] = 10 + int(pow(random.random(),3) * 90)
+
+	# random number of rows
+	config['nrows'] = random.choice([100000, 1000000])
+
+	# pages per BRIN range (for all indexes)
+	config['pages_per_range'] = random.choice([1, 32, 128])
+
+	recreate_table(conn, config['nrows'], config['randomness'], config['fillfactor'])
+
+	indexed_cols = create_indexes(conn, config['pages_per_range'])
+
+	run_queries(conn, config, indexed_cols)
+
+	conn.close()
-- 
2.41.0

#51

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Tomas Vondra (#50)

Re: PATCH: Using BRIN indexes for sorted output

On Mon, 10 Jul 2023 at 17:09, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/10/23 14:38, Matthias van de Meent wrote:

Kind of. For single-dimensional opclasses (minmax, minmax_multi) we
only need to extract the normal min/max values for ASC/DESC sorts,
which are readily available in the summary. But for multi-dimensional
and distance searches (nearest neighbour) we need to calculate the
distance between the indexed value(s) and the origin value to compare
the summary against, and the order would thus be asc/desc on distance
- a distance which may not be precisely represented by float types -
thus 'relative order' with its own order operation.

Can you give some examples of such data / queries, and how would it
leverage the BRIN sort stuff?

Order by distance would be `ORDER BY box <-> '(1, 2)'::point ASC`, and
the opclass would then decide that `<->(box, point) ASC` means it has
to return the closest distance from the point to the summary, for some
measure of 'distance' (this case L2, <#> other types, etc.). For DESC,
that would return the distance from `'(1,2)'::point` to the furthest
edge of the summary away from that point. Etc.

For distance searches, I imagine this as data indexed by BRIN inclusion
opclass, which creates a bounding box. We could return closest/furthest
point on the bounding box (from the point used in the query). Which
seems a bit like a R-tree ...

Kind of; it would allow us to utilize such orderings without the
expensive 1 tuple = 1 index entry and without scanning the full table
before getting results. No tree involved, just a sequential scan on
the index to allow some sketch-based pre-sort on the data. Again, this
would work similar to how GiST's internal pages work: each downlink in
GiST contains a summary of the entries on the downlinked page, and
distance searches use a priority queue where the priority is the
distance of the opclass-provided distance operator - lower distance
means higher priority. For BRIN, we'd have to build a priority queue
for the whole table at once, but presorting table sections is part of
the design of BRIN sort, right?

But I have no idea what would this do for multi-dimensional searches, or
what would those searches do? How would you sort such data other than
lexicographically? Which I think is covered by the current BRIN Sort,
because the data is either stored as multiple columns, in which case we
use the BRIN on the first column. Or it's indexed using BRIN minmax as a
tuple of values, but then it's sorted lexicographically.

Yes, just any BRIN summary that allows distance operators and the like
should be enough MINMAX is easy to understand, and box inclusion are
IMO also fairly easy to understand.

I haven't really thought about geometric types, just about minmax and
minmax-multi. It's not clear to me what the benefit for these types be.
I mean, we can probably sort points lexicographically, but is anyone
doing that in queries? It seems useless for order by distance.

Yes, that's why you would sort them by distance, where the distance is
generated by the opclass as min/max distance between the summary and
the distance's origin, and then inserted into the tuplesort.

OK, so the query says "order by distance from point X" and we calculate
the min/max distance of values in a given page range.

Yes, and because it's BRIN that's an approximation, which should
generally be fine.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)

#52

Tomas Vondra

tomas.vondra@enterprisedb.com

over 2 years ago

In reply to: Matthias van de Meent (#51)

Re: PATCH: Using BRIN indexes for sorted output

On 7/10/23 18:18, Matthias van de Meent wrote:

On Mon, 10 Jul 2023 at 17:09, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/10/23 14:38, Matthias van de Meent wrote:

Kind of. For single-dimensional opclasses (minmax, minmax_multi) we
only need to extract the normal min/max values for ASC/DESC sorts,
which are readily available in the summary. But for multi-dimensional
and distance searches (nearest neighbour) we need to calculate the
distance between the indexed value(s) and the origin value to compare
the summary against, and the order would thus be asc/desc on distance
- a distance which may not be precisely represented by float types -
thus 'relative order' with its own order operation.

Can you give some examples of such data / queries, and how would it
leverage the BRIN sort stuff?

Order by distance would be `ORDER BY box <-> '(1, 2)'::point ASC`, and
the opclass would then decide that `<->(box, point) ASC` means it has
to return the closest distance from the point to the summary, for some
measure of 'distance' (this case L2, <#> other types, etc.). For DESC,
that would return the distance from `'(1,2)'::point` to the furthest
edge of the summary away from that point. Etc.

Thanks.

For distance searches, I imagine this as data indexed by BRIN inclusion
opclass, which creates a bounding box. We could return closest/furthest
point on the bounding box (from the point used in the query). Which
seems a bit like a R-tree ...

Kind of; it would allow us to utilize such orderings without the
expensive 1 tuple = 1 index entry and without scanning the full table
before getting results. No tree involved, just a sequential scan on
the index to allow some sketch-based pre-sort on the data. Again, this
would work similar to how GiST's internal pages work: each downlink in
GiST contains a summary of the entries on the downlinked page, and
distance searches use a priority queue where the priority is the
distance of the opclass-provided distance operator - lower distance
means higher priority.

Yes, that's roughly how I understood this too - a tradeoff that won't
give the same performance as GiST, but much smaller and cheaper to maintain.

For BRIN, we'd have to build a priority queue
for the whole table at once, but presorting table sections is part of
the design of BRIN sort, right?

Yes, that's kinda the whole point of BRIN sort.

But I have no idea what would this do for multi-dimensional searches, or
what would those searches do? How would you sort such data other than
lexicographically? Which I think is covered by the current BRIN Sort,
because the data is either stored as multiple columns, in which case we
use the BRIN on the first column. Or it's indexed using BRIN minmax as a
tuple of values, but then it's sorted lexicographically.

Yes, just any BRIN summary that allows distance operators and the like
should be enough MINMAX is easy to understand, and box inclusion are
IMO also fairly easy to understand.

True. If minmax is interpreted as inclusion with a simple 1D points, it
kinda does the same thing. (Of course, minmax work with data types that
don't have distances, but there's similarity.)

I haven't really thought about geometric types, just about minmax and
minmax-multi. It's not clear to me what the benefit for these types be.
I mean, we can probably sort points lexicographically, but is anyone
doing that in queries? It seems useless for order by distance.

Yes, that's why you would sort them by distance, where the distance is
generated by the opclass as min/max distance between the summary and
the distance's origin, and then inserted into the tuplesort.

OK, so the query says "order by distance from point X" and we calculate
the min/max distance of values in a given page range.

Yes, and because it's BRIN that's an approximation, which should
generally be fine.

Approximation in what sense? My understanding was we'd get a range of
distances that we know covers all rows in that range. So the results
should be accurate, no?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#53

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Tomas Vondra (#52)

Re: PATCH: Using BRIN indexes for sorted output

On Mon, 10 Jul 2023 at 22:04, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/10/23 18:18, Matthias van de Meent wrote:

On Mon, 10 Jul 2023 at 17:09, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/10/23 14:38, Matthias van de Meent wrote:

I haven't really thought about geometric types, just about minmax and
minmax-multi. It's not clear to me what the benefit for these types be.
I mean, we can probably sort points lexicographically, but is anyone
doing that in queries? It seems useless for order by distance.

Yes, that's why you would sort them by distance, where the distance is
generated by the opclass as min/max distance between the summary and
the distance's origin, and then inserted into the tuplesort.

OK, so the query says "order by distance from point X" and we calculate
the min/max distance of values in a given page range.

Yes, and because it's BRIN that's an approximation, which should
generally be fine.

Approximation in what sense? My understanding was we'd get a range of
distances that we know covers all rows in that range. So the results
should be accurate, no?

The distance is going to be accurate only to the degree that the
summary can produce accurate distances for the datapoints it
represents. That can be quite imprecise due to the nature of the
contained datapoints: a summary of the points (-1, -1) and (1, 1) will
have a minimum distance of 0 to the origin, where the summary (-1, 0)
and (-1, 0.5) would have a much more accurate distance of 1. The point
I was making is that the summary can only approximate the distance,
and that approximation is fine w.r.t. the BRIN sort algoritm.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)

#54

Tomas Vondra

tomas.vondra@enterprisedb.com

over 2 years ago

In reply to: Matthias van de Meent (#53)

Re: PATCH: Using BRIN indexes for sorted output

On 7/11/23 13:20, Matthias van de Meent wrote:

On Mon, 10 Jul 2023 at 22:04, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/10/23 18:18, Matthias van de Meent wrote:

On Mon, 10 Jul 2023 at 17:09, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/10/23 14:38, Matthias van de Meent wrote:

I haven't really thought about geometric types, just about minmax and
minmax-multi. It's not clear to me what the benefit for these types be.
I mean, we can probably sort points lexicographically, but is anyone
doing that in queries? It seems useless for order by distance.

Yes, that's why you would sort them by distance, where the distance is
generated by the opclass as min/max distance between the summary and
the distance's origin, and then inserted into the tuplesort.

OK, so the query says "order by distance from point X" and we calculate
the min/max distance of values in a given page range.

Yes, and because it's BRIN that's an approximation, which should
generally be fine.

Approximation in what sense? My understanding was we'd get a range of
distances that we know covers all rows in that range. So the results
should be accurate, no?

The distance is going to be accurate only to the degree that the
summary can produce accurate distances for the datapoints it
represents. That can be quite imprecise due to the nature of the
contained datapoints: a summary of the points (-1, -1) and (1, 1) will
have a minimum distance of 0 to the origin, where the summary (-1, 0)
and (-1, 0.5) would have a much more accurate distance of 1.

Ummm, I'm probably missing something, or maybe my mental model of this
is just wrong, but why would the distance for the second summary be more
accurate? Or what does "more accurate" mean?

Is that about the range of distances for the summary? For the first
range the summary is a bounding box [(-1,1), (1,1)] so all we know the
points may have distance in range [0, sqrt(2)]. While for the second
summary it's [1, sqrt(1.25)].

The point I was making is that the summary can only approximate the
distance, and that approximation is fine w.r.t. the BRIN sort
algoritm.

I think as long as the approximation (whatever it means) does not cause
differences in results (compared to not using an index), it's OK.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#55

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Tomas Vondra (#54)

Re: PATCH: Using BRIN indexes for sorted output

On Fri, 14 Jul 2023 at 16:21, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/11/23 13:20, Matthias van de Meent wrote:

On Mon, 10 Jul 2023 at 22:04, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Approximation in what sense? My understanding was we'd get a range of
distances that we know covers all rows in that range. So the results
should be accurate, no?

The distance is going to be accurate only to the degree that the
summary can produce accurate distances for the datapoints it
represents. That can be quite imprecise due to the nature of the
contained datapoints: a summary of the points (-1, -1) and (1, 1) will
have a minimum distance of 0 to the origin, where the summary (-1, 0)
and (-1, 0.5) would have a much more accurate distance of 1.

Ummm, I'm probably missing something, or maybe my mental model of this
is just wrong, but why would the distance for the second summary be more
accurate? Or what does "more accurate" mean?

Is that about the range of distances for the summary? For the first
range the summary is a bounding box [(-1,1), (1,1)] so all we know the
points may have distance in range [0, sqrt(2)]. While for the second
summary it's [1, sqrt(1.25)].

Yes; I was trying to refer to the difference between what results you
get from the summary vs what results you get from the actual
datapoints: In this case, for finding points which are closest to the
origin, the first bounding box has a less accurate estimate than the
second.

The point I was making is that the summary can only approximate the
distance, and that approximation is fine w.r.t. the BRIN sort
algoritm.

I think as long as the approximation (whatever it means) does not cause
differences in results (compared to not using an index), it's OK.

Agreed.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#56

Tomas Vondra

tomas.vondra@enterprisedb.com

over 2 years ago

In reply to: Matthias van de Meent (#55)

Re: PATCH: Using BRIN indexes for sorted output

On 7/14/23 16:42, Matthias van de Meent wrote:

On Fri, 14 Jul 2023 at 16:21, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/11/23 13:20, Matthias van de Meent wrote:

On Mon, 10 Jul 2023 at 22:04, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

Approximation in what sense? My understanding was we'd get a range of
distances that we know covers all rows in that range. So the results
should be accurate, no?

The distance is going to be accurate only to the degree that the
summary can produce accurate distances for the datapoints it
represents. That can be quite imprecise due to the nature of the
contained datapoints: a summary of the points (-1, -1) and (1, 1) will
have a minimum distance of 0 to the origin, where the summary (-1, 0)
and (-1, 0.5) would have a much more accurate distance of 1.

Ummm, I'm probably missing something, or maybe my mental model of this
is just wrong, but why would the distance for the second summary be more
accurate? Or what does "more accurate" mean?

Is that about the range of distances for the summary? For the first
range the summary is a bounding box [(-1,1), (1,1)] so all we know the
points may have distance in range [0, sqrt(2)]. While for the second
summary it's [1, sqrt(1.25)].

Yes; I was trying to refer to the difference between what results you
get from the summary vs what results you get from the actual
datapoints: In this case, for finding points which are closest to the
origin, the first bounding box has a less accurate estimate than the
second.

OK. I think regular minmax indexes have a similar issue with
non-distance ordering, because we don't know if the min/max values are
still in the page range (or deleted/updated).

The point I was making is that the summary can only approximate the
distance, and that approximation is fine w.r.t. the BRIN sort
algoritm.

I think as long as the approximation (whatever it means) does not cause
differences in results (compared to not using an index), it's OK.

I haven't written any code yet, but I think if we don't try to find the
exact min/max distances for the summary (e.g. by calculating the closest
point exactly) but rather "estimates" that are guaranteed to bound the
actual min/max, that's good enough for the sorting.

For the max, this probably is not an issue, as we can just calculate
distance for the corners and use a maximum of that. At least with
reasonable euclidean distance ... in 2D I'm imagining the bounding box
summary as a rectangle, with the "max distance" being a minimum radius
of a circle containing it (the rectangle).

For min we're looking for the largest radius not intersecting with the
box, which seems harder to calculate I think.

However, now that I'm thinking about it - don't (SP-)GiST indexes
already do pretty much exactly this?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#57

Sergey Dudoladov

sergey.dudoladov@gmail.com

over 2 years ago

In reply to: Tomas Vondra (#56)

1 attachment(s)

Re: PATCH: Using BRIN indexes for sorted output

Hello,

Parallel version is not supported, but I think it should be possible.

@Tomas are you working on this ? If not, I would like to give it a try.

static void
AssertCheckRanges(BrinSortState *node)
{
#ifdef USE_ASSERT_CHECKING

#endif
}

I guess it should not be empty at the ongoing development stage.

Attached a small modification of the patch with a draft of the docs.

Regards,
Sergey Dudoladov

Attachments:

wip-add-docs-fix-typos-20230802.patchtext/x-patch; charset=US-ASCII; name=wip-add-docs-fix-typos-20230802.patchDownload

From d4050be4bfd0a518eba0ff0a7b561f0420be9861 Mon Sep 17 00:00:00 2001
From: Sergey Dudoladov <sergey.dudoladov@gmail.com>
Date: Wed, 2 Aug 2023 16:47:35 +0200
Subject: [PATCH] BRIN sort: add docs / fix typos

---
 doc/src/sgml/brin.sgml              | 48 +++++++++++++++++++++++++++++
 src/backend/executor/nodeBrinSort.c |  9 +++---
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml
index 9c5ffcddf8..a76f26e032 100644
--- a/doc/src/sgml/brin.sgml
+++ b/doc/src/sgml/brin.sgml
@@ -810,6 +810,54 @@ LOG:  request for BRIN range summarization for index "brin_wi_idx" page 128 was
 
 </sect1>
 
+<sect1 id="brin-sort">
+ <title>BRIN Sort</title>
+
+ <para>
+  This section provides an overview of <acronym>BRIN</acronym> sort that may be
+  useful to advanced users. See <filename>src/backend/executor/nodeBrinSort.c</filename>
+  in the source distribution for the implementation.
+ </para>
+
+ <sect2 id="brin-sort-overview">
+  <title>Overview</title>
+  <para>
+   The information about ranges in a <acronym>BRIN</acronym> index can be used
+   to split a table into parts and sort each part separately. A <acronym>BRIN</acronym>
+   sort is an index scan that uses the idea to incrementally sort the data. On
+   large tables <acronym>BRIN</acronym> index scan bridges the performance gap 
+   between having a large B-tree index that supports ordered retrieval, and having
+   no index at all.
+  </para>
+ </sect2>
+
+ <sect2 id="brin-sort-implementation">
+  <title>Implementation</title>
+  <para>
+
+  <acronym>BRIN</acronym> sort fetches a list of page ranges from the <acronym>BRIN</acronym>
+  index and sorts them by their summary info depending on the operator class and
+  sort direction. It then determines the watermark: the next summary value from
+  the list. Tuples with summary values less than the watermark are sorted and
+  outputted directly; tuples with values greater than the watermark are spilled
+  into the next iteration. The process continues until either all ranges are
+  processed or the desired number of tuples has been retrieved.
+  </para>
+ </sect2>
+
+ <sect2 id="brin-sort-limitations">
+  <title>Limitations</title>
+  <para>
+   The performance of the <acronym>BRIN</acronym> sort depends on the degree of
+   correlation between the values of the indexed column and their physical location
+   on disk: in the case of high correlation, there will be few overlapping ranges,
+   and performance will be best. Another limitation is that <acronym>BRIN</acronym>
+   sort only works for indexes built on a single column.
+  </para>
+ </sect2>
+
+</sect1>
+
 <sect1 id="brin-extensibility">
  <title>Extensibility</title>
 
diff --git a/src/backend/executor/nodeBrinSort.c b/src/backend/executor/nodeBrinSort.c
index d1b6b4e1ed..c640aea2d5 100644
--- a/src/backend/executor/nodeBrinSort.c
+++ b/src/backend/executor/nodeBrinSort.c
@@ -731,8 +731,9 @@ brinsort_load_spill_tuples(BrinSortState *node, bool check_watermark)
 static bool
 brinsort_next_range(BrinSortState *node, bool asc)
 {
-	/* FIXME free the current bs_range, if any */
-	node->bs_range = NULL;
+	/* free the current bs_range, if any */
+	if (node->bs_range != NULL)
+		pfree(node->bs_range);
 
 	/*
 	 * Mark the position, so that we can restore it in case we reach the
@@ -1227,7 +1228,7 @@ IndexNext(BrinSortState *node)
 
 			case BRINSORT_PROCESS_RANGE:
 
-				elog(DEBUG1, "phase BRINSORT_PROCESS_RANGE");
+				elog(DEBUG1, "phase = BRINSORT_PROCESS_RANGE");
 
 				slot = node->ss.ps.ps_ResultTupleSlot;
 
@@ -1263,7 +1264,7 @@ IndexNext(BrinSortState *node)
 				}
 				else
 				{
-					/* updte the watermark and try reading more ranges */
+					/* update the watermark and try reading more ranges */
 					node->bs_phase = BRINSORT_LOAD_RANGE;
 					brinsort_update_watermark(node, false, asc);
 				}
-- 
2.34.1

#58

Tomas Vondra

tomas.vondra@enterprisedb.com

over 2 years ago

In reply to: Sergey Dudoladov (#57)

Re: PATCH: Using BRIN indexes for sorted output

On 8/2/23 17:25, Sergey Dudoladov wrote:

Hello,

Parallel version is not supported, but I think it should be possible.

@Tomas are you working on this ? If not, I would like to give it a try.

Feel free to try. Just keep it in a separate part/patch, to make it
easier to combine the work later.

static void
AssertCheckRanges(BrinSortState *node)
{
#ifdef USE_ASSERT_CHECKING

#endif
}

I guess it should not be empty at the ongoing development stage.

Attached a small modification of the patch with a draft of the docs.

Thanks. FWIW it's generally better to always post the whole patch
series, otherwise the cfbot gets confused as it's unable to combine
stuff from different messages.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#59

vignesh C

vignesh21@gmail.com

almost 2 years ago

In reply to: Tomas Vondra (#58)

Re: PATCH: Using BRIN indexes for sorted output

On Wed, 2 Aug 2023 at 21:34, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

On 8/2/23 17:25, Sergey Dudoladov wrote:

Hello,

Parallel version is not supported, but I think it should be possible.

@Tomas are you working on this ? If not, I would like to give it a try.

Feel free to try. Just keep it in a separate part/patch, to make it
easier to combine the work later.

static void
AssertCheckRanges(BrinSortState *node)
{
#ifdef USE_ASSERT_CHECKING

#endif
}

I guess it should not be empty at the ongoing development stage.

Attached a small modification of the patch with a draft of the docs.

Thanks. FWIW it's generally better to always post the whole patch
series, otherwise the cfbot gets confused as it's unable to combine
stuff from different messages.

Are we planning to take this patch forward? It has been nearly 5
months since the last discussion on this. If the interest has gone
down and if there are no plans to handle this I'm thinking of
returning this commitfest entry in this commitfest and can be opened
when there is more interest.

Regards,
Vignesh

#60

vignesh C

vignesh21@gmail.com

almost 2 years ago

In reply to: vignesh C (#59)

Re: PATCH: Using BRIN indexes for sorted output

On Sun, 21 Jan 2024 at 07:32, vignesh C <vignesh21@gmail.com> wrote:

On Wed, 2 Aug 2023 at 21:34, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

On 8/2/23 17:25, Sergey Dudoladov wrote:

Hello,

Parallel version is not supported, but I think it should be possible.

@Tomas are you working on this ? If not, I would like to give it a try.

Feel free to try. Just keep it in a separate part/patch, to make it
easier to combine the work later.

static void
AssertCheckRanges(BrinSortState *node)
{
#ifdef USE_ASSERT_CHECKING

#endif
}

I guess it should not be empty at the ongoing development stage.

Attached a small modification of the patch with a draft of the docs.

Thanks. FWIW it's generally better to always post the whole patch
series, otherwise the cfbot gets confused as it's unable to combine
stuff from different messages.

Are we planning to take this patch forward? It has been nearly 5
months since the last discussion on this. If the interest has gone
down and if there are no plans to handle this I'm thinking of
returning this commitfest entry in this commitfest and can be opened
when there is more interest.

Since the author or no one else showed interest in taking it forward
and the patch had no activity for more than 5 months, I have changed
the status to RWF. Feel free to add a new CF entry when someone is
planning to resume work more actively by starting off with a rebased
version.

Regards,
Vignesh