Introduce Index Aggregate - new GROUP BY strategy

Started by Sergey Solovievabout 1 month ago10 messages

sergey.soloviev@tantorlabs.ru

about 1 month ago

4 attachment(s)

Hi, hackers!

I would like to introduce new GROUP BY strategy, called Index Aggregate.
In a nutshell, we build B+tree index where GROUP BY attributes are index
keys and if memory limit reached we will build index for each batch and
spill it to the disk as sorted run performing final external merge.

It works (and implemented) much like Hash Aggregate and most differences
in spill logic:

1. As tuples arrive build in-memory B+tree index
2. If memory limit reached then switch to the spill mode (almost like hashagg):
- calculate hash for the tuple
- decide in which batch it should be stored
- spill tuples to the batch
3. When all tuples are processed and there is no disk spill, then return all tuples
from in-memory index
4. Otherwise:
1. Spill current index to disk creating initial sorted run
2. Re-read each batch building in-memory index (may be spills again)
3. At the end of batch spill it to the disk and create another sorted run
4. Perform final external merge sort

The main benefit of this strategy is that we perform both grouping and sorting
at the same time with early aggregation. So, it's cost calculated for both group
and comparison, but we can win using early aggregation (which is not supported
by Sort + Group node).

When I was fixing tests, then most of changes occurred in partition_aggregate.out.
Their output changed in such way:

```
CREATE TABLE pagg_tab (a int, b int, c text, d int) PARTITION BY LIST(c);
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('0000', '0001', '0002', '0003', '0004');
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005', '0006', '0007', '0008');
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009', '0010', '0011');
INSERT INTO pagg_tab SELECT i % 20, i % 30, to_char(i % 12, 'FM0000'), i % 30 FROM generate_series(0, 2999) i;
ANALYZE pagg_tab;

EXPLAIN (COSTS OFF)
SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;

-- Old
QUERY PLAN
--------------------------------------------------------------------------------------------------
Limit (cost=80.18..80.18 rows=1 width=13)
-> Sort (cost=80.18..80.21 rows=12 width=13)
Sort Key: pagg_tab.c
-> HashAggregate (cost=80.00..80.12 rows=12 width=13)
Group Key: pagg_tab.c
-> Append (cost=0.00..65.00 rows=3000 width=5)
-> Seq Scan on pagg_tab_p1 pagg_tab_1 (cost=0.00..20.50 rows=1250 width=5)
-> Seq Scan on pagg_tab_p2 pagg_tab_2 (cost=0.00..17.00 rows=1000 width=5)
-> Seq Scan on pagg_tab_p3 pagg_tab_3 (cost=0.00..12.50 rows=750 width=5)

-- New
SET enable_hashagg to off;
QUERY PLAN
--------------------------------------------------------------------------------------------
Limit (cost=129.77..129.49 rows=1 width=13)
-> IndexAggregate (cost=129.77..126.39 rows=12 width=13)
Group Key: pagg_tab.c
-> Append (cost=0.00..65.00 rows=3000 width=5)
-> Seq Scan on pagg_tab_p1 pagg_tab_1 (cost=0.00..20.50 rows=1250 width=5)
-> Seq Scan on pagg_tab_p2 pagg_tab_2 (cost=0.00..17.00 rows=1000 width=5)
-> Seq Scan on pagg_tab_p3 pagg_tab_3 (cost=0.00..12.50 rows=750 width=5)
(7 rows)

```

There is a cheat - disable hashagg, but if we will run this, then (on my PC) we will see
that index aggregate executes faster:

```
-- sort + hash
SET enable_hashagg TO on;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=80.18..80.18 rows=1 width=13) (actual time=2.040..2.041 rows=1.00 loops=1)
Buffers: shared hit=20
-> Sort (cost=80.18..80.21 rows=12 width=13) (actual time=2.039..2.040 rows=1.00 loops=1)
Sort Key: pagg_tab.c
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=20
-> HashAggregate (cost=80.00..80.12 rows=12 width=13) (actual time=2.025..2.028 rows=12.00 loops=1)
Group Key: pagg_tab.c
Batches: 1 Memory Usage: 32kB
Buffers: shared hit=20
-> Append (cost=0.00..65.00 rows=3000 width=5) (actual time=0.017..0.888 rows=3000.00 loops=1)
Buffers: shared hit=20
-> Seq Scan on pagg_tab_p1 pagg_tab_1 (cost=0.00..20.50 rows=1250 width=5) (actual time=0.016..0.301 rows=1250.00 loops=1)
Buffers: shared hit=8
-> Seq Scan on pagg_tab_p2 pagg_tab_2 (cost=0.00..17.00 rows=1000 width=5) (actual time=0.007..0.225 rows=1000.00 loops=1)
Buffers: shared hit=7
-> Seq Scan on pagg_tab_p3 pagg_tab_3 (cost=0.00..12.50 rows=750 width=5) (actual time=0.006..0.171 rows=750.00 loops=1)
Buffers: shared hit=5
Planning Time: 0.119 ms
Execution Time: 2.076 ms
(20 rows)

-- index agg
SET enable_hashagg TO off;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=129.77..129.49 rows=1 width=13) (actual time=1.789..1.790 rows=1.00 loops=1)
Buffers: shared hit=20
-> IndexAggregate (cost=129.77..126.39 rows=12 width=13) (actual time=1.788..1.789 rows=1.00 loops=1)
Group Key: pagg_tab.c
Buffers: shared hit=20
-> Append (cost=0.00..65.00 rows=3000 width=5) (actual time=0.020..0.865 rows=3000.00 loops=1)
Buffers: shared hit=20
-> Seq Scan on pagg_tab_p1 pagg_tab_1 (cost=0.00..20.50 rows=1250 width=5) (actual time=0.020..0.290 rows=1250.00 loops=1)
Buffers: shared hit=8
-> Seq Scan on pagg_tab_p2 pagg_tab_2 (cost=0.00..17.00 rows=1000 width=5) (actual time=0.007..0.229 rows=1000.00 loops=1)
Buffers: shared hit=7
-> Seq Scan on pagg_tab_p3 pagg_tab_3 (cost=0.00..12.50 rows=750 width=5) (actual time=0.007..0.165 rows=750.00 loops=1)
Buffers: shared hit=5
Planning Time: 0.105 ms
Execution Time: 1.825 ms
(15 rows)
```

Mean IndexAgg time is about 1.8 ms and 2 ms for hash + sort, so win is about 10%.

Also, I have run TPC-H tests and 2 tests used Index Agg node (4 and 5) and this gave
near 5% gain in time.

This research was inspired by Graefe Goetz's paper "Efficient sorting, duplicate
removal, grouping, and aggregation". But some of proposed ideas are hard
to implement in PostgreSQL, i.e. using partitioned btrees store their page in
shared buffers or to make use of offset-value encoding.

More about details of implementation:

1. In-memory index implemented as B+tree and it stores pointers to tuples
2. Size of each B+tree node is set using macro. Now it is 63, which allows us
to use some optimizations, i.e. distribute tuples uniformly during page split
3. In-memory index has key abbreviation optimization
3. tuplesort.c is used to implement external merge sort. This is done by just
setting up state in such way that we can just call 'mergeruns'
4. When we store tuples on disk during sorted run spill we perform projection
and stored tuples are ready to be returned after merge. This is done most
because we already have returninig TupleDesc and do not have to deal with
AggStatePerGroup state (it has complex logic with 2 boolean flags).

For now there is a bare minimum implemented: in-memory index, disk spill logic
and support by explain analyze.

There are 4 patches attached:

1. 0001-add-in-memory-btree-tuple-index.patch - adds in-memory index - TupleIndex
2. 0002-introduce-AGG_INDEX-grouping-strategy-node.patch - implementation of
Index Aggregate group strategy
3. 0003-make-use-of-IndexAggregate-in-planner-and-explain.patch - planner adds
Index Aggregate nodes to the pathlist and explain analyze
shows statistics for this node
4. 0004-fix-tests-for-IndexAggregate.patch - fix tests output and adds some extra tests
for the new node

There are open questions and todos:

- No support for parallel execution. The main challenge here is to save sort invariant
and support partial aggregates.
- Use more suitable in-memory index. For example, T-Tree is the first candidate for this.
- No sgml documentation yet
- Fix and adapt tests. Not all tests are fixed by 4 patch
- Tune planner estimate. In the example, cost of index agg was higher, but actually it was
faster.

---

Sergey Soloviev

TantorLabs: https://tantorlabs.com

Attachments:

0001-add-in-memory-btree-tuple-index.patchtext/x-patch; charset=UTF-8; name=0001-add-in-memory-btree-tuple-index.patchDownload

From 61425dbfcf3d57c6f39317f86a90314f47165580 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 15:25:41 +0300
Subject: [PATCH 1/4] add in-memory btree tuple index

This patch implements in-memory B+tree structure. It will be used as
index for special type of grouping using index.

Size of each node is set using macro. For convenience equals 2^n - 1, so
for internal nodes we effectively calculate size of each page and find
split node (exactly in the middle), and for leaf nodes we can distribute
tuples for each node uniformely (according to the newly inserted tuple).

It supports different memory contexts for tracking memory allocations.
And just like in TupleHashTable during Lookup it uses 'isnew' pointer to
prevent new tuple creation (i.e. when memory limit is reached).

Also it has key abbreviation optimization support like in tuplesort. But
some code was copied and looks exactly the same way, so it is worth
separating such logic into a separate function.
---
 src/backend/executor/execGrouping.c | 643 ++++++++++++++++++++++++++++
 src/include/executor/executor.h     |  65 +++
 src/include/nodes/execnodes.h       |  86 +++-
 3 files changed, 793 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
index 8b64a625ca5..91b0ceb8ff9 100644
--- a/src/backend/executor/execGrouping.c
+++ b/src/backend/executor/execGrouping.c
@@ -622,3 +622,646 @@ TupleHashTableMatch(struct tuplehash_hash *tb, MinimalTuple tuple1, MinimalTuple
 	econtext->ecxt_outertuple = slot1;
 	return !ExecQualAndReset(hashtable->cur_eq_func, econtext);
 }
+
+/*****************************************************************************
+ * 		Utility routines for all-in-memory btree index
+ * 
+ * These routines build btree index for grouping tuples together (eg, for
+ * index aggregation).  There is one entry for each not-distinct set of tuples
+ * presented.
+ *****************************************************************************/
+
+/* 
+ * Representation of searched entry in tuple index. This have
+ * separate representation to avoid necessary memory allocations
+ * to create MinimalTuple for TupleIndexEntry.
+ */
+typedef struct TupleIndexSearchEntryData
+{
+	TupleTableSlot *slot;		/* search TupleTableSlot */
+	Datum	key1;				/* first searched key data */
+	bool	isnull1;			/* first searched key is null */
+} TupleIndexSearchEntryData;
+
+typedef TupleIndexSearchEntryData *TupleIndexSearchEntry;
+
+/* 
+ * compare_index_tuple_tiebreak
+ * 		Perform full comparison of tuples without key abbreviation.
+ * 
+ * Invoked if first key (possibly abbreviated) can not decide comparison, so
+ * we have to compare all keys.
+ */
+static inline int
+compare_index_tuple_tiebreak(TupleIndex index, TupleIndexEntry left,
+							 TupleIndexSearchEntry right)
+{
+	HeapTupleData ltup;
+	SortSupport sortKey = index->sortKeys;
+	TupleDesc tupDesc = index->tupDesc;
+	AttrNumber	attno;
+	Datum		datum1,
+				datum2;
+	bool		isnull1,
+				isnull2;
+	int			cmp;
+
+	ltup.t_len = left->tuple->t_len + MINIMAL_TUPLE_OFFSET;
+	ltup.t_data = (HeapTupleHeader) ((char *) left->tuple - MINIMAL_TUPLE_OFFSET);
+	tupDesc = index->tupDesc;
+
+	if (sortKey->abbrev_converter)
+	{
+		attno = sortKey->ssup_attno;
+
+		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);
+		datum2 = slot_getattr(right->slot, attno, &isnull2);
+
+		cmp = ApplySortAbbrevFullComparator(datum1, isnull1,
+											datum2, isnull2,
+											sortKey);
+		if (cmp != 0)
+			return cmp;
+	}
+
+	sortKey++;
+	for (int nkey = 1; nkey < index->nkeys; nkey++, sortKey++)
+	{
+		attno = sortKey->ssup_attno;
+
+		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);
+		datum2 = slot_getattr(right->slot, attno, &isnull2);
+
+		cmp = ApplySortComparator(datum1, isnull1,
+								  datum2, isnull2,
+								  sortKey);
+		if (cmp != 0)
+			return cmp;
+	}
+	
+	return 0;
+}
+
+/* 
+ * compare_index_tuple
+ * 		Compare pair of tuples during index lookup
+ * 
+ * The comparison honors key abbreviation.
+ */
+static int
+compare_index_tuple(TupleIndex index,
+					TupleIndexEntry left,
+					TupleIndexSearchEntry right)
+{
+	SortSupport sortKey = &index->sortKeys[0];
+	int	cmp = 0;
+	
+	cmp = ApplySortComparator(left->key1, left->isnull1,
+							  right->key1, right->isnull1,
+							  sortKey);
+	if (cmp != 0)
+		return cmp;
+
+	return compare_index_tuple_tiebreak(index, left, right);
+}
+
+/* 
+ * tuple_index_node_bsearch
+ * 		Perform binary search in the index node.
+ * 
+ * On return, if 'found' is set to 'true', then exact match found and returned
+ * index is an index in tuples array.  Otherwise the value handled differently:
+ * - for internal nodes this is an index in 'pointers' array which to follow
+ * - for leaf nodes this is an index to which new entry must be inserted.
+ */
+static int
+tuple_index_node_bsearch(TupleIndex index, TupleIndexNode node,
+						 TupleIndexSearchEntry search, bool *found)
+{
+	int low;
+	int high;
+	
+	low = 0;
+	high = node->ntuples;
+	*found = false;
+
+	while (low < high)
+	{
+		OffsetNumber mid = (low + high) / 2;
+		TupleIndexEntry mid_entry = node->tuples[mid];
+		int cmp;
+
+		cmp = compare_index_tuple(index, mid_entry, search);
+		if (cmp == 0)
+		{
+			*found = true;
+			return mid;
+		}
+
+		if (cmp < 0)
+			low = mid + 1;
+		else
+			high = mid;
+	}
+
+	return low;
+}
+
+static inline TupleIndexNode
+IndexLeafNodeGetNext(TupleIndexNode node)
+{
+	return node->pointers[0];
+}
+
+static inline void
+IndexLeafNodeSetNext(TupleIndexNode node, TupleIndexNode next)
+{
+	node->pointers[0] = next;
+}
+
+#define SizeofTupleIndexInternalNode \
+	  (offsetof(TupleIndexNodeData, pointers) \
+	+ (TUPLE_INDEX_NODE_MAX_ENTRIES + 1) * sizeof(TupleIndexNode))
+
+#define SizeofTupleIndexLeafNode \
+	offsetof(TupleIndexNodeData, pointers) + sizeof(TupleIndexNode)
+
+static inline TupleIndexNode
+AllocLeafIndexNode(TupleIndex index, TupleIndexNode next)
+{
+	TupleIndexNode leaf;
+	leaf = MemoryContextAllocZero(index->nodecxt, SizeofTupleIndexLeafNode);
+	IndexLeafNodeSetNext(leaf, next);
+	return leaf;
+}
+
+static inline TupleIndexNode
+AllocInternalIndexNode(TupleIndex index)
+{
+	return MemoryContextAllocZero(index->nodecxt, SizeofTupleIndexInternalNode);
+}
+
+/* 
+ * tuple_index_node_insert_at
+ * 		Insert new tuple in the node at specified index
+ * 
+ * This function is inserted when new tuple must be inserted in the node (both
+ * leaf and internal). For internal nodes 'pointer' must be also specified.
+ *
+ * Node must have free space available. It's up to caller to check if node
+ * is full and needs splitting. For split use 'tuple_index_perform_insert_split'.
+ */
+static inline void
+tuple_index_node_insert_at(TupleIndexNode node, bool is_leaf, int idx,
+						   TupleIndexEntry entry, TupleIndexNode pointer)
+{
+	int move_count;
+
+	Assert(node->ntuples < TUPLE_INDEX_NODE_MAX_ENTRIES);
+	Assert(0 <= idx && idx <= node->ntuples);
+	move_count = node->ntuples - idx;
+
+	if (move_count > 0)
+		memmove(&node->tuples[idx + 1], &node->tuples[idx],
+			move_count * sizeof(TupleIndexEntry));
+
+	node->tuples[idx] = entry;
+
+	if (!is_leaf)
+	{
+		Assert(pointer != NULL);
+
+		if (move_count > 0)
+			memmove(&node->pointers[idx + 2], &node->pointers[idx + 1],
+					move_count * sizeof(TupleIndexNode));
+		node->pointers[idx + 1] = pointer;
+	}
+
+	node->ntuples++;
+}
+
+/* 
+ * Insert tuple to full node with page split.
+ * 
+ * 'split_node_out' - new page containing nodes on right side
+ * 'split_tuple_out' - tuple, which sent to the parent node as new separator key
+ */
+static void
+tuple_index_insert_split(TupleIndex index, TupleIndexNode node, bool is_leaf,
+						 int insert_pos, TupleIndexNode *split_node_out,
+						 TupleIndexEntry *split_entry_out)
+{
+	TupleIndexNode split;
+	int split_tuple_idx;
+
+	Assert(node->ntuples == TUPLE_INDEX_NODE_MAX_ENTRIES);
+
+	if (is_leaf)
+	{
+		/* 
+		 * Max amount of tuples is kept odd, so we need to decide at
+		 * which index to perform page split. We know that split occurred
+		 * during insert, so left less entries to the page at which
+		 * insertion must occur.
+		 */
+		if (TUPLE_INDEX_NODE_MAX_ENTRIES / 2 < insert_pos)
+			split_tuple_idx = TUPLE_INDEX_NODE_MAX_ENTRIES / 2 + 1;
+		else
+			split_tuple_idx = TUPLE_INDEX_NODE_MAX_ENTRIES / 2;
+
+		split = AllocLeafIndexNode(index, IndexLeafNodeGetNext(node));
+		split->ntuples = node->ntuples - split_tuple_idx;
+		node->ntuples = split_tuple_idx;
+		memcpy(&split->tuples[0], &node->tuples[node->ntuples], 
+			   sizeof(TupleIndexEntry) * split->ntuples);
+		IndexLeafNodeSetNext(node, split);
+	}
+	else
+	{
+		/* 
+		 * After split on internal node split tuple will be removed.
+		 * Max amount of tuples is odd, so division by 2 will handle it.
+		 */
+		split_tuple_idx = TUPLE_INDEX_NODE_MAX_ENTRIES / 2;
+		split = AllocInternalIndexNode(index);
+		split->ntuples = split_tuple_idx;
+		node->ntuples = split_tuple_idx;
+		memcpy(&split->tuples[0], &node->tuples[split_tuple_idx + 1],
+				sizeof(TupleIndexEntry) * split->ntuples);
+		memcpy(&split->pointers[0], &node->pointers[split_tuple_idx + 1],
+				sizeof(TupleIndexNode) * (split->ntuples + 1));
+	}
+
+	*split_node_out = split;
+	*split_entry_out = node->tuples[split_tuple_idx];
+}
+
+static inline Datum
+mintup_getattr(MinimalTuple tup, TupleDesc tupdesc, AttrNumber attnum, bool *isnull)
+{
+	HeapTupleData htup;
+
+	htup.t_len = tup->t_len + MINIMAL_TUPLE_OFFSET;
+	htup.t_data = (HeapTupleHeader) ((char *) tup - MINIMAL_TUPLE_OFFSET);
+
+	return heap_getattr(&htup, attnum, tupdesc, isnull);
+}
+
+static TupleIndexEntry
+tuple_index_node_lookup(TupleIndex index,
+						TupleIndexNode node, int level,
+						TupleIndexSearchEntry search, bool *is_new,
+						TupleIndexNode *split_node_out,
+						TupleIndexEntry *split_entry_out)
+{
+	TupleIndexEntry entry;
+	int idx;
+	bool found;
+	bool is_leaf;
+
+	TupleIndexNode insert_pointer;
+	TupleIndexEntry insert_entry;
+	bool need_insert;
+
+	Assert(level >= 0);
+
+	idx = tuple_index_node_bsearch(index, node, search, &found);
+	if (found)
+	{
+		/* 
+		 * Both internal and leaf nodes store pointers to elements, so we can
+		 * safely return exact match found at each level.
+		 */
+		if (is_new)
+			*is_new = false;
+		return node->tuples[idx];
+	}
+
+	is_leaf = level == 0;
+	if (is_leaf)
+	{
+		MemoryContext oldcxt;
+
+		if (is_new == NULL)
+			return NULL;
+
+		oldcxt = MemoryContextSwitchTo(index->tuplecxt);
+
+		entry = palloc(sizeof(TupleIndexEntryData));
+		entry->tuple = ExecCopySlotMinimalTupleExtra(search->slot, index->additionalsize);
+
+		MemoryContextSwitchTo(oldcxt);
+
+		/* 
+		 * key1 in search tuple stored in TableTupleSlot which have it's own
+		 * lifetime, so we must not copy it.
+		 * 
+		 * But if key abbreviation is in use than we should copy it from search
+		 * tuple: this is safe (pass-by-value) and extra recalculation can
+		 * spoil statistics calculation.
+		 */
+		if (index->sortKeys->abbrev_converter)
+		{
+			entry->isnull1 = search->isnull1;
+			entry->key1 = search->key1;
+		}
+		else
+		{
+			SortSupport sortKey = &index->sortKeys[0];
+			entry->key1 = mintup_getattr(entry->tuple, index->tupDesc,
+										 sortKey->ssup_attno, &entry->isnull1);
+		}
+
+		index->ntuples++;
+
+		*is_new = true;
+		need_insert = true;
+		insert_pointer = NULL;
+		insert_entry = entry;
+	}
+	else
+	{
+		TupleIndexNode child_split_node = NULL;
+		TupleIndexEntry child_split_entry;
+
+		entry = tuple_index_node_lookup(index, node->pointers[idx], level - 1,
+										search, is_new,
+										&child_split_node, &child_split_entry);
+		if (entry == NULL)
+			return NULL;
+
+		if (child_split_node != NULL)
+		{
+			need_insert = true;
+			insert_pointer = child_split_node;
+			insert_entry = child_split_entry;
+		}
+		else
+			need_insert = false;
+	}
+	
+	if (need_insert)
+	{
+		Assert(insert_entry != NULL);
+
+		if (node->ntuples == TUPLE_INDEX_NODE_MAX_ENTRIES)
+		{
+			TupleIndexNode split_node;
+			TupleIndexEntry split_entry;
+
+			tuple_index_insert_split(index, node, is_leaf, idx,
+									 &split_node, &split_entry);
+
+			/* adjust insertion index if tuple is inserted to the splitted page */
+			if (node->ntuples < idx)
+			{
+				/* keep split tuple for leaf nodes and remove for internal */
+				if (is_leaf)
+					idx -= node->ntuples;
+				else
+					idx -= node->ntuples + 1;
+
+				node = split_node;
+			}
+
+			*split_node_out = split_node;
+			*split_entry_out = split_entry;
+		}
+
+		Assert(idx >= 0);
+		tuple_index_node_insert_at(node, is_leaf, idx, insert_entry, insert_pointer);
+	}
+
+	return entry;
+}
+
+static void
+remove_index_abbreviations(TupleIndex index)
+{
+	TupleIndexIteratorData iter;
+	TupleIndexEntry entry;
+	SortSupport sortKey = &index->sortKeys[0];
+
+	sortKey->comparator = sortKey->abbrev_full_comparator;
+	sortKey->abbrev_converter = NULL;
+	sortKey->abbrev_abort = NULL;
+	sortKey->abbrev_full_comparator = NULL;
+
+	/* now traverse all index entries and convert all existing keys */
+	InitTupleIndexIterator(index, &iter);
+	while ((entry = TupleIndexIteratorNext(&iter)) != NULL)
+		entry->key1 = mintup_getattr(entry->tuple, index->tupDesc,
+									 sortKey->ssup_attno, &entry->isnull1);
+}
+
+static inline void
+prepare_search_index_tuple(TupleIndex index, TupleTableSlot *slot,
+						   TupleIndexSearchEntry entry)
+{
+	SortSupport	sortKey;
+
+	sortKey = &index->sortKeys[0];
+
+	entry->slot = slot;
+	entry->key1 = slot_getattr(slot, sortKey->ssup_attno, &entry->isnull1);
+
+	/* NULL can not be abbreviated */
+	if (entry->isnull1)
+		return;
+
+	/* abbreviation is not used */
+	if (!sortKey->abbrev_converter)
+		return;
+
+	/* check if abbreviation should be removed */
+	if (index->abbrevNext <= index->ntuples)
+	{
+		index->abbrevNext *= 2;
+
+		if (sortKey->abbrev_abort(index->ntuples, sortKey))
+		{
+			remove_index_abbreviations(index);
+			return;
+		}
+	}
+
+	entry->key1 = sortKey->abbrev_converter(entry->key1, sortKey);
+}
+
+TupleIndexEntry
+TupleIndexLookup(TupleIndex index, TupleTableSlot *searchslot, bool *is_new)
+{
+	TupleIndexEntry entry;
+	TupleIndexSearchEntryData search_entry;
+	TupleIndexNode split_node = NULL;
+	TupleIndexEntry split_entry;
+	TupleIndexNode new_root;
+
+	prepare_search_index_tuple(index, searchslot, &search_entry);
+
+	entry = tuple_index_node_lookup(index, index->root, index->height,
+									&search_entry, is_new, &split_node, &split_entry);
+
+	if (entry == NULL)
+		return NULL;
+
+	if (split_node == NULL)
+		return entry;
+
+	/* root split */
+	new_root = AllocInternalIndexNode(index);
+	new_root->ntuples = 1;
+	new_root->tuples[0] = split_entry;
+	new_root->pointers[0] = index->root;
+	new_root->pointers[1] = split_node;
+	index->root = new_root;
+	index->height++;
+
+	return entry;
+}
+
+void
+InitTupleIndexIterator(TupleIndex index, TupleIndexIterator iter)
+{
+	TupleIndexNode min_node;
+	int level;
+
+	/* iterate to the left-most node */
+	min_node = index->root;
+	level = index->height;
+	while (level-- > 0)
+		min_node = min_node->pointers[0];
+
+	iter->cur_leaf = min_node;
+	iter->cur_idx = 0;
+}
+
+TupleIndexEntry
+TupleIndexIteratorNext(TupleIndexIterator iter)
+{
+	TupleIndexNode leaf = iter->cur_leaf;
+	TupleIndexEntry tuple;
+
+	if (leaf == NULL)
+		return NULL;
+
+	/* this also handles single empty root node case */
+	if (leaf->ntuples <= iter->cur_idx)
+	{
+		leaf = iter->cur_leaf = IndexLeafNodeGetNext(leaf);
+		if (leaf == NULL)
+			return NULL;
+		iter->cur_idx = 0;
+	}
+
+	tuple = leaf->tuples[iter->cur_idx];
+	iter->cur_idx++;
+	return tuple;
+}
+
+/* 
+ * Construct an empty TupleIndex
+ *
+ * inputDesc: tuple descriptor for input tuples
+ * nkeys: number of columns to be compared (length of next 4 arrays)
+ * attNums: attribute numbers used for grouping in sort order
+ * sortOperators: Oids of sort operator families used for comparisons
+ * sortCollations: collations used for comparisons
+ * nullsFirstFlags: strategy for handling NULL values
+ * additionalsize: size of data that may be stored along with the index entry
+ * 				   used for storing per-trans information during aggregation
+ * metacxt: memory context for TupleIndex itself
+ * tuplecxt: memory context for storing MinimalTuples
+ * nodecxt: memory context for storing index nodes
+ */
+TupleIndex
+BuildTupleIndex(TupleDesc inputDesc,
+				int nkeys,
+				AttrNumber *attNums,
+				Oid *sortOperators,
+				Oid *sortCollations,
+				bool *nullsFirstFlags,
+				Size additionalsize,
+				MemoryContext metacxt,
+				MemoryContext tuplecxt,
+				MemoryContext nodecxt)
+{
+	TupleIndex index;
+	MemoryContext oldcxt;
+
+	Assert(nkeys > 0);
+
+	additionalsize = MAXALIGN(additionalsize);
+
+	oldcxt = MemoryContextSwitchTo(metacxt);
+
+	index = (TupleIndex) palloc(sizeof(TupleIndexData));
+	index->tuplecxt = tuplecxt;
+	index->nodecxt = nodecxt;
+	index->additionalsize = additionalsize;
+	index->tupDesc = CreateTupleDescCopy(inputDesc);
+	index->root = AllocLeafIndexNode(index, NULL);
+	index->ntuples = 0;
+	index->height = 0;
+
+	index->nkeys = nkeys;
+	index->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));
+
+	for (int i = 0; i < nkeys; ++i)
+	{
+		SortSupport sortKey = &index->sortKeys[i];
+
+		Assert(AttributeNumberIsValid(attNums[i]));
+		Assert(OidIsValid(sortOperators[i]));
+
+		sortKey->ssup_cxt = CurrentMemoryContext;
+		sortKey->ssup_collation = sortCollations[i];
+		sortKey->ssup_nulls_first = nullsFirstFlags[i];
+		sortKey->ssup_attno = attNums[i];
+		/* abbreviation applies only for the first key */
+		sortKey->abbreviate = i == 0;
+
+		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
+	}
+
+	/* Update abbreviation information */
+	if (index->sortKeys[0].abbrev_converter != NULL)
+	{
+		index->abbrevUsed = true;
+		index->abbrevNext = 10;
+		index->abbrevSortOp = sortOperators[0];
+	}
+	else
+		index->abbrevUsed = false;
+
+	MemoryContextSwitchTo(oldcxt);
+	return index;
+}
+
+/* 
+ * Resets contents of the index to be empty, preserving all the non-content
+ * state.
+ */
+void
+ResetTupleIndex(TupleIndex index)
+{
+	SortSupport ssup;
+
+	/* by this time indexcxt must be reset by the caller */
+	index->root = AllocLeafIndexNode(index, NULL);
+	index->height = 0;
+	index->ntuples = 0;
+	
+	if (!index->abbrevUsed)
+		return;
+
+	/* 
+	 * If key abbreviation is used then we must reset it's state.
+	 * All fields in SortSupport are already setup, but we should clean
+	 * some fields to make it look just if we setup this for the first time.
+	 */
+	ssup = &index->sortKeys[0];
+	ssup->comparator = NULL;
+	PrepareSortSupportFromOrderingOp(index->abbrevSortOp, ssup);
+}
+
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fa2b657fb2f..6192cc8d143 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -198,6 +198,71 @@ TupleHashEntryGetAdditional(TupleHashTable hashtable, TupleHashEntry entry)
 }
 #endif
 
+extern TupleIndex BuildTupleIndex(TupleDesc inputDesc,
+								  int nkeys,
+								  AttrNumber *attNums,
+								  Oid *sortOperators,
+								  Oid *sortCollations,
+								  bool *nullsFirstFlags,
+								  Size additionalsize,
+								  MemoryContext metacxt,
+								  MemoryContext tablecxt,
+								  MemoryContext nodecxt);
+extern TupleIndexEntry TupleIndexLookup(TupleIndex index, TupleTableSlot *search,
+		  								bool *is_new);
+extern void ResetTupleIndex(TupleIndex index);
+
+/* 
+ * Start iteration over tuples in index. Supports only ascending direction.
+ * During iterations no modifications are allowed, because it can break iterator.
+ */
+extern void	InitTupleIndexIterator(TupleIndex index, TupleIndexIterator iter);
+extern TupleIndexEntry TupleIndexIteratorNext(TupleIndexIterator iter);
+static inline void
+ResetTupleIndexIterator(TupleIndex index, TupleIndexIterator iter)
+{
+	InitTupleIndexIterator(index, iter);
+}
+
+#ifndef FRONTEND
+
+/* 
+ * Return size of the index entry. Useful for estimating memory usage.
+ */
+static inline size_t
+TupleIndexEntrySize(void)
+{
+	return sizeof(TupleIndexEntryData);
+}
+
+/* 
+ * Get a pointer to the additional space allocated for this entry. The
+ * memory will be maxaligned and zeroed.
+ * 
+ * The amount of space available is the additionalsize requested in the call
+ * to BuildTupleIndex(). If additionalsize was specified as zero, return
+ * NULL.
+ */
+static inline void *
+TupleIndexEntryGetAdditional(TupleIndex index, TupleIndexEntry entry)
+{
+if (index->additionalsize > 0)
+	return (char *) (entry->tuple) - index->additionalsize;
+else
+	return NULL;
+}
+
+/* 
+ * Return tuple from index entry
+ */
+static inline MinimalTuple
+TupleIndexEntryGetMinimalTuple(TupleIndexEntry entry)
+{
+	return entry->tuple;
+}
+
+#endif
+
 /*
  * prototypes from functions in execJunk.c
  */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 64ff6996431..99ee472b51f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -900,7 +900,91 @@ typedef tuplehash_iterator TupleHashIterator;
 #define ScanTupleHashTable(htable, iter) \
 	tuplehash_iterate(htable->hashtab, iter)
 
-
+/* ---------------------------------------------------------------
+ * 				Tuple Btree index
+	*
+	* All-in-memory tuple Btree index used for grouping and aggregating.
+	* ---------------------------------------------------------------
+	*/
+
+	/* 
+	 * Representation of tuple in index.  It stores both tuple and
+	* first key information.  If key abbreviation is used, then this
+	* first key stores abbreviated key.
+	*/
+typedef struct TupleIndexEntryData
+{
+	MinimalTuple tuple;	/* actual stored tuple */
+	Datum	key1;		/* value of first key */
+	bool	isnull1;	/* first key is null */
+} TupleIndexEntryData;
+
+typedef TupleIndexEntryData *TupleIndexEntry;
+
+/* 
+ * Btree node of tuple index. Common for both internal and leaf nodes.
+	*/
+typedef struct TupleIndexNodeData
+{
+	/* amount of tuples in the node */
+	int ntuples;
+
+/* 
+ * Maximal amount of tuples stored in tuple index node.
+	*
+	* NOTE: use 2^n - 1 count, so all all tuples will fully utilize cache lines
+	*       (except first because of 'ntuples' padding)
+	*/
+#define TUPLE_INDEX_NODE_MAX_ENTRIES  63
+
+	/* 
+	 * array of tuples for this page.
+		* 
+		* for internal node these are separator keys.
+		* for leaf nodes actual tuples.
+		*/
+	TupleIndexEntry tuples[TUPLE_INDEX_NODE_MAX_ENTRIES];
+
+	/* 
+	 * for internal nodes this is an array with size
+		* TUPLE_INDEX_NODE_MAX_ENTRIES + 1 - pointers to nodes below.
+		* 
+		* for leaf nodes this is an array of 1 element - pointer to sibling
+		* node required for iteration
+		*/
+	struct TupleIndexNodeData *pointers[FLEXIBLE_ARRAY_MEMBER];
+} TupleIndexNodeData;
+
+typedef TupleIndexNodeData *TupleIndexNode;
+
+typedef struct TupleIndexData
+{
+	TupleDesc	tupDesc;		/* descriptor for stored tuples */
+	TupleIndexNode root;		/* root of the tree */
+	int		height;				/* current tree height */
+	int		ntuples;			/* number of tuples in index */
+	int		nkeys;				/* amount of keys in tuple */
+	SortSupport	sortKeys;		/* support functions for key comparison */
+	MemoryContext	tuplecxt;	/* memory context containing tuples */
+	MemoryContext	nodecxt;	/* memory context containing index nodes */
+	Size	additionalsize;		/* size of additional data for tuple */
+	int		abbrevNext;			/* next time we should check abbreviation 
+									* optimization efficiency */
+	bool	abbrevUsed;			/* true if key abbreviation optimization
+									* was ever used */
+	Oid		abbrevSortOp;		/* sort operator for first key */
+} TupleIndexData;
+
+typedef struct TupleIndexData *TupleIndex;
+
+typedef struct TupleIndexIteratorData
+{
+	TupleIndexNode	cur_leaf;	/* current leaf node */
+	OffsetNumber	cur_idx;	/* index of tuple to return next */
+} TupleIndexIteratorData;
+
+typedef TupleIndexIteratorData *TupleIndexIterator;
+	
 /* ----------------------------------------------------------------
  *				 Expression State Nodes
  *
-- 
2.43.0

0002-introduce-AGG_INDEX-grouping-strategy-node.patchtext/x-patch; charset=UTF-8; name=0002-introduce-AGG_INDEX-grouping-strategy-node.patchDownload

From d34a6d32dca85e6d36b0d262fbc68d62a852cd81 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 16:41:58 +0300
Subject: [PATCH 2/4] introduce AGG_INDEX grouping strategy node

AGG_INDEX is a new grouping strategy that builds in-memory index and use
it for grouping. The main advantage of this approach is that output is
ordered by grouping columns and if there are any ORDER BY specified,
then it will use this to build grouping/sorting columns.

For index it uses B+tree which was implemented in previous commit. And
overall it's implementation is very close to AGG_HASHED:

- maintain in-memory grouping structure
- track memory consuption
- if memory limit reached spill data to disk in batches (using hash of
  key columns)
- hash batches are processed one after another and for each batch fill
  new in-memory structure

For this reason many code logic is generalized to support both index and
hash implementations: function generalization using boolean arguments
(i.e. 'ishash'), rename spill logic members in AggState with prefix
'spill_' instead of 'hash_', etc.

Most differences are in spill logic: to preserve sort order in case of disk
spill we must dump all indexes to disk to create sorted runs and perform
final external merge.

One problem is external merge. It's adapted from tuplesort.c - introduce
new operational mode - tuplemerge (with it's own prefix). Internally we
just setup state accordingly and process as earlier without any
significant code changes.

Another problem is what tuples to save into sorted runs. We decided to
store tuples after projection (when it's aggregates are finalized),
because internal transition info is represented by value/isnull/novalue
tripple (in AggStatePerGroupData) which is quiet hard to serialize and
handle, but actually, after projection all group by attributes are
saved, so we can access them during merge. Also, projection applies
filter, so it can discard some tuples.
---
 src/backend/executor/execExpr.c            |   31 +-
 src/backend/executor/nodeAgg.c             | 1378 +++++++++++++++++---
 src/backend/utils/sort/tuplesort.c         |  209 ++-
 src/backend/utils/sort/tuplesortvariants.c |  105 ++
 src/include/executor/executor.h            |   10 +-
 src/include/executor/nodeAgg.h             |   33 +-
 src/include/nodes/execnodes.h              |   61 +-
 src/include/nodes/nodes.h                  |    1 +
 src/include/nodes/plannodes.h              |    2 +-
 src/include/utils/tuplesort.h              |   17 +-
 10 files changed, 1618 insertions(+), 229 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index b05ff476a63..ca53ef450ea 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -94,7 +94,7 @@ static void ExecInitCoerceToDomain(ExprEvalStep *scratch, CoerceToDomain *ctest,
 static void ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 								  ExprEvalStep *scratch,
 								  FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
-								  int transno, int setno, int setoff, bool ishash,
+								  int transno, int setno, int setoff, int strategy,
 								  bool nullcheck);
 static void ExecInitJsonExpr(JsonExpr *jsexpr, ExprState *state,
 							 Datum *resv, bool *resnull,
@@ -3675,7 +3675,7 @@ ExecInitCoerceToDomain(ExprEvalStep *scratch, CoerceToDomain *ctest,
  */
 ExprState *
 ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
-				  bool doSort, bool doHash, bool nullcheck)
+				  int groupStrategy, bool nullcheck)
 {
 	ExprState  *state = makeNode(ExprState);
 	PlanState  *parent = &aggstate->ss.ps;
@@ -3933,7 +3933,7 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 		 * grouping set). Do so for both sort and hash based computations, as
 		 * applicable.
 		 */
-		if (doSort)
+		if (groupStrategy & GROUPING_STRATEGY_SORT)
 		{
 			int			processGroupingSets = Max(phase->numsets, 1);
 			int			setoff = 0;
@@ -3941,13 +3941,13 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 			for (int setno = 0; setno < processGroupingSets; setno++)
 			{
 				ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
-									  pertrans, transno, setno, setoff, false,
-									  nullcheck);
+									  pertrans, transno, setno, setoff,
+									  GROUPING_STRATEGY_SORT, nullcheck);
 				setoff++;
 			}
 		}
 
-		if (doHash)
+		if (groupStrategy & GROUPING_STRATEGY_HASH)
 		{
 			int			numHashes = aggstate->num_hashes;
 			int			setoff;
@@ -3961,12 +3961,19 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 			for (int setno = 0; setno < numHashes; setno++)
 			{
 				ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
-									  pertrans, transno, setno, setoff, true,
-									  nullcheck);
+									  pertrans, transno, setno, setoff,
+									  GROUPING_STRATEGY_HASH, nullcheck);
 				setoff++;
 			}
 		}
 
+		if (groupStrategy & GROUPING_STRATEGY_INDEX)
+		{
+			ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
+								  pertrans, transno, 0, 0,
+								  GROUPING_STRATEGY_INDEX, nullcheck);
+		}
+
 		/* adjust early bail out jump target(s) */
 		foreach(bail, adjust_bailout)
 		{
@@ -4019,16 +4026,18 @@ static void
 ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 					  ExprEvalStep *scratch,
 					  FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
-					  int transno, int setno, int setoff, bool ishash,
+					  int transno, int setno, int setoff, int strategy,
 					  bool nullcheck)
 {
 	ExprContext *aggcontext;
 	int			adjust_jumpnull = -1;
 
-	if (ishash)
+	if (strategy & GROUPING_STRATEGY_HASH)
 		aggcontext = aggstate->hashcontext;
-	else
+	else if (strategy & GROUPING_STRATEGY_SORT)
 		aggcontext = aggstate->aggcontexts[setno];
+	else
+		aggcontext = aggstate->indexcontext;
 
 	/* add check for NULL pointer? */
 	if (nullcheck)
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 0b02fd32107..4fc2a09b365 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -364,7 +364,7 @@ typedef struct FindColsContext
 	Bitmapset  *unaggregated;	/* other column references */
 } FindColsContext;
 
-static void select_current_set(AggState *aggstate, int setno, bool is_hash);
+static void select_current_set(AggState *aggstate, int setno, int strategy);
 static void initialize_phase(AggState *aggstate, int newphase);
 static TupleTableSlot *fetch_input_tuple(AggState *aggstate);
 static void initialize_aggregates(AggState *aggstate,
@@ -403,8 +403,8 @@ static void find_cols(AggState *aggstate, Bitmapset **aggregated,
 static bool find_cols_walker(Node *node, FindColsContext *context);
 static void build_hash_tables(AggState *aggstate);
 static void build_hash_table(AggState *aggstate, int setno, double nbuckets);
-static void hashagg_recompile_expressions(AggState *aggstate, bool minslot,
-										  bool nullcheck);
+static void agg_recompile_expressions(AggState *aggstate, bool minslot,
+									  bool nullcheck);
 static void hash_create_memory(AggState *aggstate);
 static double hash_choose_num_buckets(double hashentrysize,
 									  double ngroups, Size memory);
@@ -431,13 +431,13 @@ static HashAggBatch *hashagg_batch_new(LogicalTape *input_tape, int setno,
 									   int64 input_tuples, double input_card,
 									   int used_bits);
 static MinimalTuple hashagg_batch_read(HashAggBatch *batch, uint32 *hashp);
-static void hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset,
-							   int used_bits, double input_groups,
-							   double hashentrysize);
-static Size hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
-								TupleTableSlot *inputslot, uint32 hash);
-static void hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill,
-								 int setno);
+static void agg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset,
+						   int used_bits, double input_groups,
+						   double hashentrysize);
+static Size agg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
+							TupleTableSlot *inputslot, uint32 hash);
+static void agg_spill_finish(AggState *aggstate, HashAggSpill *spill,
+							 int setno);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
 static void build_pertrans_for_aggref(AggStatePerTrans pertrans,
 									  AggState *aggstate, EState *estate,
@@ -446,21 +446,27 @@ static void build_pertrans_for_aggref(AggStatePerTrans pertrans,
 									  Oid aggdeserialfn, Datum initValue,
 									  bool initValueIsNull, Oid *inputTypes,
 									  int numArguments);
-
+static void agg_fill_index(AggState *state);
+static TupleTableSlot *agg_retrieve_index(AggState *state);
+static void lookup_index_entries(AggState *state);
+static void indexagg_finish_initial_spills(AggState *aggstate);
+static void index_agg_enter_spill_mode(AggState *aggstate);
 
 /*
  * Select the current grouping set; affects current_set and
  * curaggcontext.
  */
 static void
-select_current_set(AggState *aggstate, int setno, bool is_hash)
+select_current_set(AggState *aggstate, int setno, int strategy)
 {
 	/*
 	 * When changing this, also adapt ExecAggPlainTransByVal() and
 	 * ExecAggPlainTransByRef().
 	 */
-	if (is_hash)
+	if (strategy == GROUPING_STRATEGY_HASH)
 		aggstate->curaggcontext = aggstate->hashcontext;
+	else if (strategy == GROUPING_STRATEGY_INDEX)
+		aggstate->curaggcontext = aggstate->indexcontext;
 	else
 		aggstate->curaggcontext = aggstate->aggcontexts[setno];
 
@@ -680,7 +686,7 @@ initialize_aggregates(AggState *aggstate,
 	{
 		AggStatePerGroup pergroup = pergroups[setno];
 
-		select_current_set(aggstate, setno, false);
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_SORT);
 
 		for (transno = 0; transno < numTrans; transno++)
 		{
@@ -1478,7 +1484,7 @@ build_hash_tables(AggState *aggstate)
 			continue;
 		}
 
-		memory = aggstate->hash_mem_limit / aggstate->num_hashes;
+		memory = aggstate->spill_mem_limit / aggstate->num_hashes;
 
 		/* choose reasonable number of buckets per hashtable */
 		nbuckets = hash_choose_num_buckets(aggstate->hashentrysize,
@@ -1496,7 +1502,7 @@ build_hash_tables(AggState *aggstate)
 		build_hash_table(aggstate, setno, nbuckets);
 	}
 
-	aggstate->hash_ngroups_current = 0;
+	aggstate->spill_ngroups_current = 0;
 }
 
 /*
@@ -1728,7 +1734,7 @@ hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
 }
 
 /*
- * hashagg_recompile_expressions()
+ * agg_recompile_expressions()
  *
  * Identifies the right phase, compiles the right expression given the
  * arguments, and then sets phase->evalfunc to that expression.
@@ -1746,34 +1752,47 @@ hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
  * expressions in the AggStatePerPhase, and reuse when appropriate.
  */
 static void
-hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
+agg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
 {
 	AggStatePerPhase phase;
 	int			i = minslot ? 1 : 0;
 	int			j = nullcheck ? 1 : 0;
 
 	Assert(aggstate->aggstrategy == AGG_HASHED ||
-		   aggstate->aggstrategy == AGG_MIXED);
+		   aggstate->aggstrategy == AGG_MIXED ||
+		   aggstate->aggstrategy == AGG_INDEX);
 
-	if (aggstate->aggstrategy == AGG_HASHED)
-		phase = &aggstate->phases[0];
-	else						/* AGG_MIXED */
+	if (aggstate->aggstrategy == AGG_MIXED)
 		phase = &aggstate->phases[1];
+	else						/* AGG_HASHED or AGG_INDEX */
+		phase = &aggstate->phases[0];
 
 	if (phase->evaltrans_cache[i][j] == NULL)
 	{
 		const TupleTableSlotOps *outerops = aggstate->ss.ps.outerops;
 		bool		outerfixed = aggstate->ss.ps.outeropsfixed;
-		bool		dohash = true;
-		bool		dosort = false;
+		int			strategy = 0;
 
-		/*
-		 * If minslot is true, that means we are processing a spilled batch
-		 * (inside agg_refill_hash_table()), and we must not advance the
-		 * sorted grouping sets.
-		 */
-		if (aggstate->aggstrategy == AGG_MIXED && !minslot)
-			dosort = true;
+		switch (aggstate->aggstrategy)
+		{
+			case AGG_MIXED:
+				/*
+				 * If minslot is true, that means we are processing a spilled batch
+				 * (inside agg_refill_hash_table()), and we must not advance the
+				 * sorted grouping sets.
+				 */
+				if (!minslot)
+					strategy |= GROUPING_STRATEGY_SORT;
+				/* FALLTHROUGH */
+			case AGG_HASHED:
+				strategy |= GROUPING_STRATEGY_HASH;
+				break;
+			case AGG_INDEX:
+				strategy |= GROUPING_STRATEGY_INDEX;
+				break;	
+			default:
+				Assert(false);
+		}
 
 		/* temporarily change the outerops while compiling the expression */
 		if (minslot)
@@ -1783,8 +1802,7 @@ hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
 		}
 
 		phase->evaltrans_cache[i][j] = ExecBuildAggTrans(aggstate, phase,
-														 dosort, dohash,
-														 nullcheck);
+														 strategy, nullcheck);
 
 		/* change back */
 		aggstate->ss.ps.outerops = outerops;
@@ -1803,9 +1821,9 @@ hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
  * substantially larger than the initial value.
  */
 void
-hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
-					Size *mem_limit, uint64 *ngroups_limit,
-					int *num_partitions)
+agg_set_limits(double hashentrysize, double input_groups, int used_bits,
+			   Size *mem_limit, uint64 *ngroups_limit,
+			   int *num_partitions)
 {
 	int			npartitions;
 	Size		partition_mem;
@@ -1853,6 +1871,18 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
 		*ngroups_limit = 1;
 }
 
+static inline bool
+agg_spill_required(AggState *aggstate, Size total_mem)
+{
+	/*
+	 * Don't spill unless there's at least one group in the hash table so we
+	 * can be sure to make progress even in edge cases.
+	 */
+	return aggstate->spill_ngroups_current > 0 &&
+			(total_mem > aggstate->spill_mem_limit ||
+			 aggstate->spill_ngroups_current > aggstate->spill_ngroups_limit);
+}
+
 /*
  * hash_agg_check_limits
  *
@@ -1863,7 +1893,6 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
 static void
 hash_agg_check_limits(AggState *aggstate)
 {
-	uint64		ngroups = aggstate->hash_ngroups_current;
 	Size		meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt,
 													 true);
 	Size		entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt,
@@ -1874,7 +1903,7 @@ hash_agg_check_limits(AggState *aggstate)
 	bool		do_spill = false;
 
 #ifdef USE_INJECTION_POINTS
-	if (ngroups >= 1000)
+	if (aggstate->spill_ngroups_current >= 1000)
 	{
 		if (IS_INJECTION_POINT_ATTACHED("hash-aggregate-spill-1000"))
 		{
@@ -1888,9 +1917,7 @@ hash_agg_check_limits(AggState *aggstate)
 	 * Don't spill unless there's at least one group in the hash table so we
 	 * can be sure to make progress even in edge cases.
 	 */
-	if (aggstate->hash_ngroups_current > 0 &&
-		(total_mem > aggstate->hash_mem_limit ||
-		 ngroups > aggstate->hash_ngroups_limit))
+	if (agg_spill_required(aggstate, total_mem))
 	{
 		do_spill = true;
 	}
@@ -1899,97 +1926,199 @@ hash_agg_check_limits(AggState *aggstate)
 		hash_agg_enter_spill_mode(aggstate);
 }
 
+static void
+index_agg_check_limits(AggState *aggstate)
+{
+	Size		meta_mem = MemoryContextMemAllocated(aggstate->index_metacxt,
+													 true);
+	Size		node_mem = MemoryContextMemAllocated(aggstate->index_nodecxt,
+													 true);
+	Size		entry_mem = MemoryContextMemAllocated(aggstate->index_entrycxt,
+													  true);
+	Size		tval_mem = MemoryContextMemAllocated(aggstate->indexcontext->ecxt_per_tuple_memory,
+													 true);
+	Size		total_mem = meta_mem + node_mem + entry_mem + tval_mem;
+	bool		do_spill = false;
+
+#ifdef USE_INJECTION_POINTS
+	if (aggstate->spill_ngroups_current >= 1000)
+	{
+		if (IS_INJECTION_POINT_ATTACHED("index-aggregate-spill-1000"))
+		{
+			do_spill = true;
+			INJECTION_POINT_CACHED("index-aggregate-spill-1000", NULL);
+		}
+	}
+#endif
+
+	if (agg_spill_required(aggstate, total_mem))
+	{
+		do_spill = true;
+	}
+
+	if (do_spill)
+		index_agg_enter_spill_mode(aggstate);
+}
+
 /*
  * Enter "spill mode", meaning that no new groups are added to any of the hash
  * tables. Tuples that would create a new group are instead spilled, and
  * processed later.
  */
-static void
-hash_agg_enter_spill_mode(AggState *aggstate)
+static inline void
+agg_enter_spill_mode(AggState *aggstate, bool ishash)
 {
-	INJECTION_POINT("hash-aggregate-enter-spill-mode", NULL);
-	aggstate->hash_spill_mode = true;
-	hashagg_recompile_expressions(aggstate, aggstate->table_filled, true);
-
-	if (!aggstate->hash_ever_spilled)
+	if (ishash)
 	{
-		Assert(aggstate->hash_tapeset == NULL);
-		Assert(aggstate->hash_spills == NULL);
-
-		aggstate->hash_ever_spilled = true;
-
-		aggstate->hash_tapeset = LogicalTapeSetCreate(true, NULL, -1);
+		INJECTION_POINT("hash-aggregate-enter-spill-mode", NULL);
+		aggstate->spill_mode = true;
+		agg_recompile_expressions(aggstate, aggstate->table_filled, true);	
+	}
+	else
+	{
+		INJECTION_POINT("index-aggregate-enter-spill-mode", NULL);
+		aggstate->spill_mode = true;
+		agg_recompile_expressions(aggstate, aggstate->index_filled, true);
+	}
+	
+	if (!aggstate->spill_ever_happened)
+	{
+		Assert(aggstate->spill_tapeset == NULL);
+		Assert(aggstate->spills == NULL);
 
-		aggstate->hash_spills = palloc(sizeof(HashAggSpill) * aggstate->num_hashes);
+		aggstate->spill_ever_happened = true;
+		aggstate->spill_tapeset = LogicalTapeSetCreate(true, NULL, -1);
 
-		for (int setno = 0; setno < aggstate->num_hashes; setno++)
+		if (ishash)
 		{
-			AggStatePerHash perhash = &aggstate->perhash[setno];
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
-
-			hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
+			aggstate->spills = palloc(sizeof(HashAggSpill) * aggstate->num_hashes);
+	
+			for (int setno = 0; setno < aggstate->num_hashes; setno++)
+			{
+				AggStatePerHash perhash = &aggstate->perhash[setno];
+				HashAggSpill *spill = &aggstate->spills[setno];
+	
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
 							   perhash->aggnode->numGroups,
 							   aggstate->hashentrysize);
+			}
+		}
+		else
+		{
+			aggstate->spills = palloc(sizeof(HashAggSpill));
+			agg_spill_init(aggstate->spills, aggstate->spill_tapeset, 0,
+						   aggstate->perindex->aggnode->numGroups,
+						   aggstate->hashentrysize);
 		}
 	}
 }
 
+static void
+hash_agg_enter_spill_mode(AggState *aggstate)
+{
+	agg_enter_spill_mode(aggstate, true);
+}
+
+static void
+index_agg_enter_spill_mode(AggState *aggstate)
+{
+	agg_enter_spill_mode(aggstate, false);
+}
+
 /*
  * Update metrics after filling the hash table.
  *
  * If reading from the outer plan, from_tape should be false; if reading from
  * another tape, from_tape should be true.
  */
-static void
-hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+static inline void
+agg_update_spill_metrics(AggState *aggstate, bool from_tape, int npartitions, bool ishash)
 {
 	Size		meta_mem;
 	Size		entry_mem;
-	Size		hashkey_mem;
+	Size		key_mem;
 	Size		buffer_mem;
 	Size		total_mem;
 
 	if (aggstate->aggstrategy != AGG_MIXED &&
-		aggstate->aggstrategy != AGG_HASHED)
+		aggstate->aggstrategy != AGG_HASHED &&
+		aggstate->aggstrategy != AGG_INDEX)
 		return;
 
-	/* memory for the hash table itself */
-	meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt, true);
-
-	/* memory for hash entries */
-	entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt, true);
-
-	/* memory for byref transition states */
-	hashkey_mem = MemoryContextMemAllocated(aggstate->hashcontext->ecxt_per_tuple_memory, true);
-
+		if (ishash)
+		{
+			/* memory for the hash table itself */
+			meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt, true);
+			
+			/* memory for hash entries */
+			entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt, true);
+			
+			/* memory for byref transition states */
+			key_mem = MemoryContextMemAllocated(aggstate->hashcontext->ecxt_per_tuple_memory, true);
+		}
+		else
+		{
+			/* memory for the index itself */
+			meta_mem = MemoryContextMemAllocated(aggstate->index_metacxt, true);
+			
+			/* memory for the index nodes */
+			meta_mem += MemoryContextMemAllocated(aggstate->index_nodecxt, true);
+			
+			/* memory for index entries */
+			entry_mem = MemoryContextMemAllocated(aggstate->index_entrycxt, true);
+
+			/* memory for byref transition states */
+			key_mem = MemoryContextMemAllocated(aggstate->indexcontext->ecxt_per_tuple_memory, true);
+		}
 	/* memory for read/write tape buffers, if spilled */
 	buffer_mem = npartitions * HASHAGG_WRITE_BUFFER_SIZE;
 	if (from_tape)
 		buffer_mem += HASHAGG_READ_BUFFER_SIZE;
 
 	/* update peak mem */
-	total_mem = meta_mem + entry_mem + hashkey_mem + buffer_mem;
-	if (total_mem > aggstate->hash_mem_peak)
-		aggstate->hash_mem_peak = total_mem;
+	total_mem = meta_mem + entry_mem + key_mem + buffer_mem;
+	if (total_mem > aggstate->spill_mem_peak)
+		aggstate->spill_mem_peak = total_mem;
 
 	/* update disk usage */
-	if (aggstate->hash_tapeset != NULL)
+	if (aggstate->spill_tapeset != NULL)
 	{
-		uint64		disk_used = LogicalTapeSetBlocks(aggstate->hash_tapeset) * (BLCKSZ / 1024);
+		uint64		disk_used = LogicalTapeSetBlocks(aggstate->spill_tapeset) * (BLCKSZ / 1024);
 
-		if (aggstate->hash_disk_used < disk_used)
-			aggstate->hash_disk_used = disk_used;
+		if (aggstate->spill_disk_used < disk_used)
+			aggstate->spill_disk_used = disk_used;
 	}
 
 	/* update hashentrysize estimate based on contents */
-	if (aggstate->hash_ngroups_current > 0)
+	if (aggstate->spill_ngroups_current > 0)
 	{
-		aggstate->hashentrysize =
-			TupleHashEntrySize() +
-			(hashkey_mem / (double) aggstate->hash_ngroups_current);
+		if (ishash)
+		{
+			aggstate->hashentrysize =
+				TupleHashEntrySize() +
+				(key_mem / (double) aggstate->spill_ngroups_current);
+		}
+		else
+		{
+			/* index stores MinimalTuples directly without any wrapper */
+			aggstate->hashentrysize = 
+				(key_mem / (double) aggstate->spill_ngroups_current);
+		}
 	}
 }
 
+static void
+hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+{
+	agg_update_spill_metrics(aggstate, from_tape, npartitions, true);
+}
+
+static void
+index_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+{
+	agg_update_spill_metrics(aggstate, from_tape, npartitions, false);
+}
+
 /*
  * Create memory contexts used for hash aggregation.
  */
@@ -2048,6 +2177,33 @@ hash_create_memory(AggState *aggstate)
 
 }
 
+/*
+ * Create memory contexts used for index aggregation.
+ */
+static void
+index_create_memory(AggState *aggstate)
+{
+	Size maxBlockSize = ALLOCSET_DEFAULT_MAXSIZE;
+	
+	aggstate->indexcontext = CreateWorkExprContext(aggstate->ss.ps.state);
+	
+	aggstate->index_metacxt = AllocSetContextCreate(aggstate->ss.ps.state->es_query_cxt,
+													"IndexAgg meta context",
+													ALLOCSET_DEFAULT_SIZES);
+	aggstate->index_nodecxt = BumpContextCreate(aggstate->ss.ps.state->es_query_cxt,
+												"IndexAgg node context",
+												ALLOCSET_SMALL_SIZES);
+
+	maxBlockSize = pg_prevpower2_size_t(work_mem * (Size) 1024 / 16);
+	maxBlockSize = Min(maxBlockSize, ALLOCSET_DEFAULT_MAXSIZE);
+	maxBlockSize = Max(maxBlockSize, ALLOCSET_DEFAULT_INITSIZE);
+	aggstate->index_entrycxt = AllocSetContextCreate(aggstate->ss.ps.state->es_query_cxt,
+												"IndexAgg table context",
+												ALLOCSET_DEFAULT_MINSIZE,
+												ALLOCSET_DEFAULT_INITSIZE,
+												maxBlockSize);
+}
+
 /*
  * Choose a reasonable number of buckets for the initial hash table size.
  */
@@ -2141,7 +2297,7 @@ initialize_hash_entry(AggState *aggstate, TupleHashTable hashtable,
 	AggStatePerGroup pergroup;
 	int			transno;
 
-	aggstate->hash_ngroups_current++;
+	aggstate->spill_ngroups_current++;
 	hash_agg_check_limits(aggstate);
 
 	/* no need to allocate or initialize per-group state */
@@ -2196,9 +2352,9 @@ lookup_hash_entries(AggState *aggstate)
 		bool	   *p_isnew;
 
 		/* if hash table already spilled, don't create new entries */
-		p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
 
-		select_current_set(aggstate, setno, true);
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_HASH);
 		prepare_hash_slot(perhash,
 						  outerslot,
 						  hashslot);
@@ -2214,15 +2370,15 @@ lookup_hash_entries(AggState *aggstate)
 		}
 		else
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 			TupleTableSlot *slot = aggstate->tmpcontext->ecxt_outertuple;
 
 			if (spill->partitions == NULL)
-				hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
-								   perhash->aggnode->numGroups,
-								   aggstate->hashentrysize);
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
+							   perhash->aggnode->numGroups,
+							   aggstate->hashentrysize);
 
-			hashagg_spill_tuple(aggstate, spill, slot, hash);
+			agg_spill_tuple(aggstate, spill, slot, hash);
 			pergroup[setno] = NULL;
 		}
 	}
@@ -2265,6 +2421,12 @@ ExecAgg(PlanState *pstate)
 			case AGG_SORTED:
 				result = agg_retrieve_direct(node);
 				break;
+			case AGG_INDEX:
+				if (!node->index_filled)
+					agg_fill_index(node);
+
+				result = agg_retrieve_index(node);
+				break;
 		}
 
 		if (!TupIsNull(result))
@@ -2381,7 +2543,7 @@ agg_retrieve_direct(AggState *aggstate)
 				aggstate->table_filled = true;
 				ResetTupleHashIterator(aggstate->perhash[0].hashtable,
 									   &aggstate->perhash[0].hashiter);
-				select_current_set(aggstate, 0, true);
+				select_current_set(aggstate, 0, GROUPING_STRATEGY_HASH);
 				return agg_retrieve_hash_table(aggstate);
 			}
 			else
@@ -2601,7 +2763,7 @@ agg_retrieve_direct(AggState *aggstate)
 
 		prepare_projection_slot(aggstate, econtext->ecxt_outertuple, currentSet);
 
-		select_current_set(aggstate, currentSet, false);
+		select_current_set(aggstate, currentSet, GROUPING_STRATEGY_SORT);
 
 		finalize_aggregates(aggstate,
 							peragg,
@@ -2683,19 +2845,19 @@ agg_refill_hash_table(AggState *aggstate)
 	HashAggBatch *batch;
 	AggStatePerHash perhash;
 	HashAggSpill spill;
-	LogicalTapeSet *tapeset = aggstate->hash_tapeset;
+	LogicalTapeSet *tapeset = aggstate->spill_tapeset;
 	bool		spill_initialized = false;
 
-	if (aggstate->hash_batches == NIL)
+	if (aggstate->spill_batches == NIL)
 		return false;
 
 	/* hash_batches is a stack, with the top item at the end of the list */
-	batch = llast(aggstate->hash_batches);
-	aggstate->hash_batches = list_delete_last(aggstate->hash_batches);
+	batch = llast(aggstate->spill_batches);
+	aggstate->spill_batches = list_delete_last(aggstate->spill_batches);
 
-	hash_agg_set_limits(aggstate->hashentrysize, batch->input_card,
-						batch->used_bits, &aggstate->hash_mem_limit,
-						&aggstate->hash_ngroups_limit, NULL);
+	agg_set_limits(aggstate->hashentrysize, batch->input_card,
+				   batch->used_bits, &aggstate->spill_mem_limit,
+				   &aggstate->spill_ngroups_limit, NULL);
 
 	/*
 	 * Each batch only processes one grouping set; set the rest to NULL so
@@ -2712,7 +2874,7 @@ agg_refill_hash_table(AggState *aggstate)
 	for (int setno = 0; setno < aggstate->num_hashes; setno++)
 		ResetTupleHashTable(aggstate->perhash[setno].hashtable);
 
-	aggstate->hash_ngroups_current = 0;
+	aggstate->spill_ngroups_current = 0;
 
 	/*
 	 * In AGG_MIXED mode, hash aggregation happens in phase 1 and the output
@@ -2726,7 +2888,7 @@ agg_refill_hash_table(AggState *aggstate)
 		aggstate->phase = &aggstate->phases[aggstate->current_phase];
 	}
 
-	select_current_set(aggstate, batch->setno, true);
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_HASH);
 
 	perhash = &aggstate->perhash[aggstate->current_set];
 
@@ -2737,19 +2899,19 @@ agg_refill_hash_table(AggState *aggstate)
 	 * We still need the NULL check, because we are only processing one
 	 * grouping set at a time and the rest will be NULL.
 	 */
-	hashagg_recompile_expressions(aggstate, true, true);
+	agg_recompile_expressions(aggstate, true, true);
 
 	INJECTION_POINT("hash-aggregate-process-batch", NULL);
 	for (;;)
 	{
-		TupleTableSlot *spillslot = aggstate->hash_spill_rslot;
+		TupleTableSlot *spillslot = aggstate->spill_rslot;
 		TupleTableSlot *hashslot = perhash->hashslot;
 		TupleHashTable hashtable = perhash->hashtable;
 		TupleHashEntry entry;
 		MinimalTuple tuple;
 		uint32		hash;
 		bool		isnew = false;
-		bool	   *p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
+		bool	   *p_isnew = aggstate->spill_mode ? NULL : &isnew;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2782,11 +2944,11 @@ agg_refill_hash_table(AggState *aggstate)
 				 * that we don't assign tapes that will never be used.
 				 */
 				spill_initialized = true;
-				hashagg_spill_init(&spill, tapeset, batch->used_bits,
-								   batch->input_card, aggstate->hashentrysize);
+				agg_spill_init(&spill, tapeset, batch->used_bits,
+							   batch->input_card, aggstate->hashentrysize);
 			}
 			/* no memory for a new group, spill */
-			hashagg_spill_tuple(aggstate, &spill, spillslot, hash);
+			agg_spill_tuple(aggstate, &spill, spillslot, hash);
 
 			aggstate->hash_pergroup[batch->setno] = NULL;
 		}
@@ -2806,16 +2968,16 @@ agg_refill_hash_table(AggState *aggstate)
 
 	if (spill_initialized)
 	{
-		hashagg_spill_finish(aggstate, &spill, batch->setno);
+		agg_spill_finish(aggstate, &spill, batch->setno);
 		hash_agg_update_metrics(aggstate, true, spill.npartitions);
 	}
 	else
 		hash_agg_update_metrics(aggstate, true, 0);
 
-	aggstate->hash_spill_mode = false;
+	aggstate->spill_mode = false;
 
 	/* prepare to walk the first hash table */
-	select_current_set(aggstate, batch->setno, true);
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_HASH);
 	ResetTupleHashIterator(aggstate->perhash[batch->setno].hashtable,
 						   &aggstate->perhash[batch->setno].hashiter);
 
@@ -2975,14 +3137,14 @@ agg_retrieve_hash_table_in_memory(AggState *aggstate)
 }
 
 /*
- * hashagg_spill_init
+ * agg_spill_init
  *
  * Called after we determined that spilling is necessary. Chooses the number
  * of partitions to create, and initializes them.
  */
 static void
-hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
-				   double input_groups, double hashentrysize)
+agg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
+			   double input_groups, double hashentrysize)
 {
 	int			npartitions;
 	int			partition_bits;
@@ -3018,14 +3180,13 @@ hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
 }
 
 /*
- * hashagg_spill_tuple
+ * agg_spill_tuple
  *
- * No room for new groups in the hash table. Save for later in the appropriate
- * partition.
+ * No room for new groups in memory. Save for later in the appropriate partition.
  */
 static Size
-hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
-					TupleTableSlot *inputslot, uint32 hash)
+agg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
+				TupleTableSlot *inputslot, uint32 hash)
 {
 	TupleTableSlot *spillslot;
 	int			partition;
@@ -3039,7 +3200,7 @@ hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
 	/* spill only attributes that we actually need */
 	if (!aggstate->all_cols_needed)
 	{
-		spillslot = aggstate->hash_spill_wslot;
+		spillslot = aggstate->spill_wslot;
 		slot_getsomeattrs(inputslot, aggstate->max_colno_needed);
 		ExecClearTuple(spillslot);
 		for (int i = 0; i < spillslot->tts_tupleDescriptor->natts; i++)
@@ -3167,14 +3328,14 @@ hashagg_finish_initial_spills(AggState *aggstate)
 	int			setno;
 	int			total_npartitions = 0;
 
-	if (aggstate->hash_spills != NULL)
+	if (aggstate->spills != NULL)
 	{
 		for (setno = 0; setno < aggstate->num_hashes; setno++)
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 
 			total_npartitions += spill->npartitions;
-			hashagg_spill_finish(aggstate, spill, setno);
+			agg_spill_finish(aggstate, spill, setno);
 		}
 
 		/*
@@ -3182,21 +3343,21 @@ hashagg_finish_initial_spills(AggState *aggstate)
 		 * processing batches of spilled tuples. The initial spill structures
 		 * are no longer needed.
 		 */
-		pfree(aggstate->hash_spills);
-		aggstate->hash_spills = NULL;
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
 	}
 
 	hash_agg_update_metrics(aggstate, false, total_npartitions);
-	aggstate->hash_spill_mode = false;
+	aggstate->spill_mode = false;
 }
 
 /*
- * hashagg_spill_finish
+ * agg_spill_finish
  *
  * Transform spill partitions into new batches.
  */
 static void
-hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
+agg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
 {
 	int			i;
 	int			used_bits = 32 - spill->shift;
@@ -3223,8 +3384,8 @@ hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
 		new_batch = hashagg_batch_new(tape, setno,
 									  spill->ntuples[i], cardinality,
 									  used_bits);
-		aggstate->hash_batches = lappend(aggstate->hash_batches, new_batch);
-		aggstate->hash_batches_used++;
+		aggstate->spill_batches = lappend(aggstate->spill_batches, new_batch);
+		aggstate->spill_batches_used++;
 	}
 
 	pfree(spill->ntuples);
@@ -3239,33 +3400,670 @@ static void
 hashagg_reset_spill_state(AggState *aggstate)
 {
 	/* free spills from initial pass */
-	if (aggstate->hash_spills != NULL)
+	if (aggstate->spills != NULL)
 	{
 		int			setno;
 
 		for (setno = 0; setno < aggstate->num_hashes; setno++)
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 
 			pfree(spill->ntuples);
 			pfree(spill->partitions);
 		}
-		pfree(aggstate->hash_spills);
-		aggstate->hash_spills = NULL;
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
 	}
 
 	/* free batches */
-	list_free_deep(aggstate->hash_batches);
-	aggstate->hash_batches = NIL;
+	list_free_deep(aggstate->spill_batches);
+	aggstate->spill_batches = NIL;
 
 	/* close tape set */
-	if (aggstate->hash_tapeset != NULL)
+	if (aggstate->spill_tapeset != NULL)
 	{
-		LogicalTapeSetClose(aggstate->hash_tapeset);
-		aggstate->hash_tapeset = NULL;
+		LogicalTapeSetClose(aggstate->spill_tapeset);
+		aggstate->spill_tapeset = NULL;
 	}
 }
+static void
+agg_fill_index(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	ExprContext *tmpcontext = aggstate->tmpcontext;
+	
+	/*
+	 * Process each outer-plan tuple, and then fetch the next one, until we
+	 * exhaust the outer plan.
+	 */
+	for (;;)
+	{
+		TupleTableSlot *outerslot;
+
+		outerslot = fetch_input_tuple(aggstate);
+		if (TupIsNull(outerslot))
+			break;
+
+		/* set up for lookup_index_entries and advance_aggregates */
+		tmpcontext->ecxt_outertuple = outerslot;
 
+		/* insert input tuple to index possibly spilling index to disk */
+		lookup_index_entries(aggstate);
+
+		/* Advance the aggregates (or combine functions) */
+		advance_aggregates(aggstate);
+
+		/*
+		 * Reset per-input-tuple context after each tuple, but note that the
+		 * hash lookups do this too
+		 */
+		ResetExprContext(aggstate->tmpcontext);
+	}
+
+	/* 
+	 * Mark that index filled here, so during after recompilation
+	 * expr will expect MinimalTuple instead of outer plan's one type.
+	 */
+	aggstate->index_filled = true;
+
+	indexagg_finish_initial_spills(aggstate);
+
+	/* 
+	 * This is useful only when there is no spill occurred and projecting
+	 * occurs in memory, but still initialize it.
+	 */
+	select_current_set(aggstate, 0, GROUPING_STRATEGY_INDEX);
+	InitTupleIndexIterator(perindex->index, &perindex->iter);
+}
+
+/* 
+ * Extract the attributes that make up the grouping key into the
+ * indexslot. This is necessary to perform comparison in index.
+ */
+static void
+prepare_index_slot(AggStatePerIndex perindex,
+				   TupleTableSlot *inputslot,
+				   TupleTableSlot *indexslot)
+{
+	slot_getsomeattrs(inputslot, perindex->largestGrpColIdx);
+	ExecClearTuple(indexslot);
+	
+	for (int i = 0; i < perindex->numCols; ++i)
+	{
+		int		varNumber = perindex->idxKeyColIdxInput[i] - 1;
+		indexslot->tts_values[i] = inputslot->tts_values[varNumber];
+		indexslot->tts_isnull[i] = inputslot->tts_isnull[varNumber];
+	}
+	ExecStoreVirtualTuple(indexslot);
+}
+
+static void
+indexagg_reset_spill_state(AggState *aggstate)
+{
+	/* free spills from initial pass */
+	if (aggstate->spills != NULL)
+	{
+		HashAggSpill *spill = &aggstate->spills[0];
+		pfree(spill->ntuples);
+		pfree(spill->partitions);
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
+	}
+
+	/* free batches */
+	list_free_deep(aggstate->spill_batches);
+	aggstate->spill_batches = NIL;
+
+	/* close tape set */
+	if (aggstate->spill_tapeset != NULL)
+	{
+		LogicalTapeSetClose(aggstate->spill_tapeset);
+		aggstate->spill_tapeset = NULL;
+	}
+}
+
+/* 
+ * Initialize a freshly-created MinimalTuple in index
+ */
+static void
+initialize_index_entry(AggState *aggstate, TupleIndex index, TupleIndexEntry entry)
+{
+	AggStatePerGroup pergroup;
+
+	aggstate->spill_ngroups_current++;
+	index_agg_check_limits(aggstate);
+
+	/* no need to allocate or initialize per-group state */
+	if (aggstate->numtrans == 0)
+		return;		
+
+	pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(index, entry);
+	
+	/* 
+	 * Initialize aggregates for new tuple group, indexagg_lookup_entries()
+	 * already has selected the relevant grouping set.
+	 */
+	for (int transno = 0; transno < aggstate->numtrans; ++transno)
+	{
+		AggStatePerTrans pertrans = &aggstate->pertrans[transno];
+		AggStatePerGroup pergroupstate = &pergroup[transno];
+		
+		initialize_aggregate(aggstate, pertrans, pergroupstate);
+	}
+}
+
+/* 
+ * Create new sorted run from current in-memory stored index.
+ */
+static void
+indexagg_save_index_run(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	ExprContext *econtext;
+	TupleIndexIteratorData iter;
+	AggStatePerAgg peragg;
+	TupleTableSlot *firstSlot;
+	TupleIndexEntry entry;
+	TupleTableSlot *indexslot;
+	AggStatePerGroup pergroup;
+	
+	econtext = aggstate->ss.ps.ps_ExprContext;
+	firstSlot = aggstate->ss.ss_ScanTupleSlot;
+	peragg = aggstate->peragg;
+	indexslot = perindex->indexslot;
+
+	InitTupleIndexIterator(perindex->index, &iter);
+	
+	tuplemerge_start_run(aggstate->mergestate);
+
+	while ((entry = TupleIndexIteratorNext(&iter)) != NULL)
+	{
+		MinimalTuple tuple = TupleIndexEntryGetMinimalTuple(entry);
+		TupleTableSlot *output;
+
+		ResetExprContext(econtext);
+		ExecStoreMinimalTuple(tuple, indexslot, false);
+		slot_getallattrs(indexslot);
+		
+		ExecClearTuple(firstSlot);
+		memset(firstSlot->tts_isnull, true,
+			   firstSlot->tts_tupleDescriptor->natts * sizeof(bool));
+
+		for (int i = 0; i < perindex->numCols; i++)
+		{
+			int varNumber = perindex->idxKeyColIdxInput[i] - 1;
+
+			firstSlot->tts_values[varNumber] = indexslot->tts_values[i];
+			firstSlot->tts_isnull[varNumber] = indexslot->tts_isnull[i];
+		}
+		ExecStoreVirtualTuple(firstSlot);
+
+		pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(perindex->index, entry);
+
+		econtext->ecxt_outertuple = firstSlot;
+		prepare_projection_slot(aggstate,
+								econtext->ecxt_outertuple,
+								aggstate->current_set);
+		finalize_aggregates(aggstate, peragg, pergroup);
+		output = project_aggregates(aggstate);
+		if (output)
+			tuplemerge_puttupleslot(aggstate->mergestate, output);
+	}
+
+	tuplemerge_end_run(aggstate->mergestate);
+}
+
+/* 
+ * Fill in index with tuples in given batch.
+ */
+static void
+indexagg_refill_batch(AggState *aggstate, HashAggBatch *batch)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	TupleTableSlot *spillslot = aggstate->spill_rslot;
+	TupleTableSlot *indexslot = perindex->indexslot;
+	TupleIndex index = perindex->index;
+	LogicalTapeSet *tapeset = aggstate->spill_tapeset;
+	HashAggSpill spill;
+	bool	spill_initialized = false;
+	int nspill = 0;
+	
+	agg_set_limits(aggstate->hashentrysize, batch->input_card, batch->used_bits,
+				   &aggstate->spill_mem_limit, &aggstate->spill_ngroups_limit, NULL);
+
+	ReScanExprContext(aggstate->indexcontext);
+
+	MemoryContextReset(aggstate->index_entrycxt);
+	MemoryContextReset(aggstate->index_nodecxt);
+	ResetTupleIndex(perindex->index);
+
+	aggstate->spill_ngroups_current = 0;
+
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_INDEX);
+
+	agg_recompile_expressions(aggstate, true, true);
+
+	for (;;)
+	{
+		MinimalTuple tuple;
+		TupleIndexEntry entry;
+		bool		isnew = false;
+		bool	   *p_isnew;
+		uint32		hash;
+
+		CHECK_FOR_INTERRUPTS();
+		
+		tuple = hashagg_batch_read(batch, &hash);
+		if (tuple == NULL)
+			break;
+
+		ExecStoreMinimalTuple(tuple, spillslot, true);
+		aggstate->tmpcontext->ecxt_outertuple = spillslot;
+
+		prepare_index_slot(perindex, spillslot, indexslot);
+
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
+		entry = TupleIndexLookup(index, indexslot, p_isnew);
+
+		if (entry != NULL)
+		{
+			if (isnew)
+				initialize_index_entry(aggstate, index, entry);
+
+			aggstate->all_pergroups[batch->setno] = TupleIndexEntryGetAdditional(index, entry);
+			advance_aggregates(aggstate);
+		}
+		else
+		{
+			if (!spill_initialized)
+			{
+				spill_initialized = true;
+				agg_spill_init(&spill, tapeset, batch->used_bits,
+							   batch->input_card, aggstate->hashentrysize);
+			}
+			nspill++;
+
+			agg_spill_tuple(aggstate, &spill, spillslot, hash);
+			aggstate->all_pergroups[batch->setno] = NULL;
+		}
+		
+		ResetExprContext(aggstate->tmpcontext);
+	}
+
+	LogicalTapeClose(batch->input_tape);
+
+	if (spill_initialized)
+	{
+		agg_spill_finish(aggstate, &spill, 0);
+		index_agg_update_metrics(aggstate, true, spill.npartitions);
+	}
+	else
+		index_agg_update_metrics(aggstate, true, 0);
+
+	aggstate->spill_mode = false;
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_INDEX);
+
+	pfree(batch);
+}
+
+static void
+indexagg_finish_initial_spills(AggState *aggstate)
+{
+	HashAggSpill *spill;
+	AggStatePerIndex perindex;
+	Sort		 *sort;
+
+	if (!aggstate->spill_ever_happened)
+		return;
+
+	Assert(aggstate->spills != NULL);
+
+	spill = aggstate->spills;
+	agg_spill_finish(aggstate, aggstate->spills, 0);
+
+	index_agg_update_metrics(aggstate, false, spill->npartitions);
+	aggstate->spill_mode = false;
+
+	pfree(aggstate->spills);
+	aggstate->spills = NULL;
+
+	perindex = aggstate->perindex;
+	sort = aggstate->index_sort;
+	aggstate->mergestate = tuplemerge_begin_heap(aggstate->ss.ps.ps_ResultTupleDesc,
+												 perindex->numKeyCols,
+												 perindex->idxKeyColIdxTL,
+												 sort->sortOperators,
+												 sort->collations,
+												 sort->nullsFirst,
+												 work_mem, NULL);
+	/* 
+	 * Some data was spilled.  Index aggregate requires output to be sorted,
+	 * so now we must process all remaining spilled data and produce sorted
+	 * runs for external merge.  The first saved run is current opened index.
+	 */
+	indexagg_save_index_run(aggstate);
+
+	while (aggstate->spill_batches != NIL)
+	{
+		HashAggBatch *batch = llast(aggstate->spill_batches);
+		aggstate->spill_batches = list_delete_last(aggstate->spill_batches);
+
+		indexagg_refill_batch(aggstate, batch);
+		indexagg_save_index_run(aggstate);
+	}
+
+	tuplemerge_performmerge(aggstate->mergestate);
+}
+
+static uint32
+index_calculate_input_slot_hash(AggState *aggstate,
+								TupleTableSlot *inputslot)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	MemoryContext oldcxt;
+	uint32 hash;
+	bool isnull;
+	
+	oldcxt = MemoryContextSwitchTo(aggstate->tmpcontext->ecxt_per_tuple_memory);
+	
+	perindex->exprcontext->ecxt_innertuple = inputslot;
+	hash = DatumGetUInt32(ExecEvalExpr(perindex->indexhashexpr,
+									   perindex->exprcontext,
+									   &isnull));
+
+	MemoryContextSwitchTo(oldcxt);
+
+	return hash;
+}
+
+/* 
+ * indexagg_lookup_entries
+ * 
+ * Insert input tuples to in-memory index.
+ */
+static void
+lookup_index_entries(AggState *aggstate)
+{
+	int numGroupingSets = Max(aggstate->maxsets, 1);
+	AggStatePerGroup *pergroup = aggstate->all_pergroups;
+	TupleTableSlot *outerslot = aggstate->tmpcontext->ecxt_outertuple;
+
+	for (int setno = 0; setno < numGroupingSets; ++setno)
+	{
+		AggStatePerIndex	perindex = &aggstate->perindex[setno];
+		TupleIndex		index = perindex->index;
+		TupleTableSlot *indexslot = perindex->indexslot;
+		TupleIndexEntry	entry;
+		bool			isnew = false;
+		bool		   *p_isnew;
+
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_INDEX);
+
+		prepare_index_slot(perindex, outerslot, indexslot);
+
+		/* Lookup entry in btree */
+		entry = TupleIndexLookup(perindex->index, indexslot, p_isnew);
+
+		/* For now everything is stored in memory - no disk spills */
+		if (entry != NULL)
+		{
+			/* Initialize it's trans state if just created */
+			if (isnew)
+				initialize_index_entry(aggstate, index, entry);
+
+			pergroup[setno] = TupleIndexEntryGetAdditional(index, entry);
+		}
+		else
+		{
+			HashAggSpill *spill = &aggstate->spills[setno];
+			uint32 hash;
+			
+			if (spill->partitions == NULL)
+			{
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
+							   perindex->aggnode->numGroups,
+							   aggstate->hashentrysize);
+			}
+
+			hash = index_calculate_input_slot_hash(aggstate, indexslot);
+			agg_spill_tuple(aggstate, spill, outerslot, hash);
+			pergroup[setno] = NULL;
+		}
+	}
+}
+
+static TupleTableSlot *
+agg_retrieve_index_in_memory(AggState *aggstate)
+{
+	ExprContext *econtext;
+	TupleTableSlot *firstSlot;
+	AggStatePerIndex perindex;
+	AggStatePerAgg peragg;
+	AggStatePerGroup pergroup;
+	TupleTableSlot *result;
+	
+	econtext = aggstate->ss.ps.ps_ExprContext;
+	firstSlot = aggstate->ss.ss_ScanTupleSlot;
+	peragg = aggstate->peragg;
+	perindex = &aggstate->perindex[aggstate->current_set];
+
+	for (;;)
+	{
+		TupleIndexEntry entry;
+		TupleTableSlot *indexslot = perindex->indexslot;
+
+		CHECK_FOR_INTERRUPTS();
+		
+		entry = TupleIndexIteratorNext(&perindex->iter);
+		if (entry == NULL)
+			return NULL;
+
+		ResetExprContext(econtext);
+		ExecStoreMinimalTuple(TupleIndexEntryGetMinimalTuple(entry), indexslot, false);
+		slot_getallattrs(indexslot);
+		
+		ExecClearTuple(firstSlot);
+		memset(firstSlot->tts_isnull, true,
+			   firstSlot->tts_tupleDescriptor->natts * sizeof(bool));
+
+		for (int i = 0; i < perindex->numCols; i++)
+		{
+			int varNumber = perindex->idxKeyColIdxInput[i] - 1;
+
+			firstSlot->tts_values[varNumber] = indexslot->tts_values[i];
+			firstSlot->tts_isnull[varNumber] = indexslot->tts_isnull[i];
+		}
+		ExecStoreVirtualTuple(firstSlot);
+		
+		pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(perindex->index, entry);
+		
+		econtext->ecxt_outertuple = firstSlot;
+		prepare_projection_slot(aggstate,
+								econtext->ecxt_outertuple,
+								aggstate->current_set);
+		finalize_aggregates(aggstate, peragg, pergroup);
+		result = project_aggregates(aggstate);
+		if (result)
+			return result;
+	}
+	
+	/* no more groups */
+	return NULL;
+}
+
+static TupleTableSlot *
+agg_retrieve_index_merge(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	TupleTableSlot *slot = perindex->mergeslot;
+	TupleTableSlot *resultslot = aggstate->ss.ps.ps_ResultTupleSlot;
+	
+	ExecClearTuple(slot);
+	
+	if (!tuplesort_gettupleslot(aggstate->mergestate, true, true, slot, NULL))
+		return NULL;
+
+	slot_getallattrs(slot);
+	ExecClearTuple(resultslot);
+	
+	for (int i = 0; i < resultslot->tts_tupleDescriptor->natts; ++i)
+	{
+		resultslot->tts_values[i] = slot->tts_values[i];
+		resultslot->tts_isnull[i] = slot->tts_isnull[i];
+	}
+	ExecStoreVirtualTuple(resultslot);
+
+	return resultslot;
+}
+
+static TupleTableSlot *
+agg_retrieve_index(AggState *aggstate)
+{
+	if (aggstate->spill_ever_happened)
+		return agg_retrieve_index_merge(aggstate);
+	else
+		return agg_retrieve_index_in_memory(aggstate);
+}
+
+static void
+build_index(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	MemoryContext metacxt = aggstate->index_metacxt;
+	MemoryContext entrycxt = aggstate->index_entrycxt;
+	MemoryContext nodecxt = aggstate->index_nodecxt;
+	MemoryContext oldcxt;
+	Size	additionalsize;
+	Oid	   *eqfuncoids;
+	Sort   *sort;
+
+	Assert(aggstate->aggstrategy == AGG_INDEX);
+
+	additionalsize = aggstate->numtrans * sizeof(AggStatePerGroupData);
+	sort = aggstate->index_sort;
+
+	/* inmem index */
+	perindex->index = BuildTupleIndex(perindex->indexslot->tts_tupleDescriptor,
+									  perindex->numKeyCols,
+									  perindex->idxKeyColIdxIndex,
+									  sort->sortOperators,
+									  sort->collations,
+									  sort->nullsFirst,
+									  additionalsize,
+									  metacxt,
+									  entrycxt,
+									  nodecxt);
+
+	/* disk spill logic */
+	oldcxt = MemoryContextSwitchTo(metacxt);
+	execTuplesHashPrepare(perindex->numKeyCols, perindex->aggnode->grpOperators,
+						  &eqfuncoids, &perindex->hashfunctions);
+	perindex->indexhashexpr =
+		ExecBuildHash32FromAttrs(perindex->indexslot->tts_tupleDescriptor,
+								 perindex->indexslot->tts_ops,
+								 perindex->hashfunctions,
+								 perindex->aggnode->grpCollations,
+								 perindex->numKeyCols,
+								 perindex->idxKeyColIdxIndex,
+								 &aggstate->ss.ps,
+								 0);
+	perindex->exprcontext = CreateStandaloneExprContext();
+	MemoryContextSwitchTo(oldcxt);
+}
+
+static void
+find_index_columns(AggState *aggstate)
+{
+	Bitmapset  *base_colnos;
+	Bitmapset  *aggregated_colnos;
+	TupleDesc	scanDesc = aggstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	List	   *outerTlist = outerPlanState(aggstate)->plan->targetlist;
+	EState	   *estate = aggstate->ss.ps.state;
+	AggStatePerIndex perindex;
+	Bitmapset  *colnos;
+	AttrNumber *sortColIdx;
+	List	   *indexTlist = NIL;
+	TupleDesc   indexDesc;
+	int			maxCols;
+	int			i;
+
+	find_cols(aggstate, &aggregated_colnos, &base_colnos);
+	aggstate->colnos_needed = bms_union(base_colnos, aggregated_colnos);
+	aggstate->max_colno_needed = 0;
+	aggstate->all_cols_needed = true;
+
+	for (i = 0; i < scanDesc->natts; i++)
+	{
+		int		colno = i + 1;
+
+		if (bms_is_member(colno, aggstate->colnos_needed))
+			aggstate->max_colno_needed = colno;
+		else
+			aggstate->all_cols_needed = false;
+	}
+
+	perindex = aggstate->perindex;
+	colnos = bms_copy(base_colnos);
+
+	if (aggstate->phases[0].grouped_cols)
+	{
+		Bitmapset *grouped_cols = aggstate->phases[0].grouped_cols[0];
+		ListCell  *lc;
+		foreach(lc, aggstate->all_grouped_cols)
+		{
+			int attnum = lfirst_int(lc);
+			if (!bms_is_member(attnum, grouped_cols))
+				colnos = bms_del_member(colnos, attnum);
+		}
+	}
+
+	maxCols = bms_num_members(colnos) + perindex->numKeyCols;
+
+	perindex->idxKeyColIdxInput = palloc(maxCols * sizeof(AttrNumber));
+	perindex->idxKeyColIdxIndex = palloc(perindex->numKeyCols * sizeof(AttrNumber));
+
+	/* Add all the sorting/grouping columns to colnos */
+	sortColIdx = aggstate->index_sort->sortColIdx;
+	for (i = 0; i < perindex->numKeyCols; i++)
+		colnos = bms_add_member(colnos, sortColIdx[i]);
+	
+	for (i = 0; i < perindex->numKeyCols; i++)
+	{
+		perindex->idxKeyColIdxInput[i] = sortColIdx[i];
+		perindex->idxKeyColIdxIndex[i] = i + 1;
+
+		perindex->numCols++;
+		/* delete already mapped columns */
+		colnos = bms_del_member(colnos, sortColIdx[i]);
+	}
+	
+	/* and the remainig columns */
+	i = -1;
+	while ((i = bms_next_member(colnos, i)) >= 0)
+	{
+		perindex->idxKeyColIdxInput[perindex->numCols] = i;
+		perindex->numCols++;
+	}
+
+	/* build tuple descriptor for the index */
+	perindex->largestGrpColIdx = 0;
+	for (i = 0; i < perindex->numCols; i++)
+	{
+		int		varNumber = perindex->idxKeyColIdxInput[i] - 1;
+		
+		indexTlist = lappend(indexTlist, list_nth(outerTlist, varNumber));
+		perindex->largestGrpColIdx = Max(varNumber + 1, perindex->largestGrpColIdx);
+	}
+
+	indexDesc = ExecTypeFromTL(indexTlist);
+	perindex->indexslot = ExecAllocTableSlot(&estate->es_tupleTable, indexDesc,
+										   &TTSOpsMinimalTuple);
+	list_free(indexTlist);
+	bms_free(colnos);
+
+	bms_free(base_colnos);
+}
 
 /* -----------------
  * ExecInitAgg
@@ -3297,10 +4095,12 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	int			numGroupingSets = 1;
 	int			numPhases;
 	int			numHashes;
+	int			numIndexes;
 	int			i = 0;
 	int			j = 0;
 	bool		use_hashing = (node->aggstrategy == AGG_HASHED ||
 							   node->aggstrategy == AGG_MIXED);
+	bool		use_index = (node->aggstrategy == AGG_INDEX);
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -3337,6 +4137,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	 */
 	numPhases = (use_hashing ? 1 : 2);
 	numHashes = (use_hashing ? 1 : 0);
+	numIndexes = (use_index ? 1 : 0);
 
 	/*
 	 * Calculate the maximum number of grouping sets in any phase; this
@@ -3356,7 +4157,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 
 			/*
 			 * additional AGG_HASHED aggs become part of phase 0, but all
-			 * others add an extra phase.
+			 * others add an extra phase.  AGG_INDEX does not support grouping
+			 * sets, so else branch must be AGG_SORTED or AGG_MIXED.
 			 */
 			if (agg->aggstrategy != AGG_HASHED)
 				++numPhases;
@@ -3396,6 +4198,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 
 	if (use_hashing)
 		hash_create_memory(aggstate);
+	else if (use_index)
+		index_create_memory(aggstate);
 
 	ExecAssignExprContext(estate, &aggstate->ss.ps);
 
@@ -3502,6 +4306,13 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		aggstate->phases[0].gset_lengths = palloc(numHashes * sizeof(int));
 		aggstate->phases[0].grouped_cols = palloc(numHashes * sizeof(Bitmapset *));
 	}
+	else if (numIndexes)
+	{
+		aggstate->perindex = palloc0(sizeof(AggStatePerIndexData) * numIndexes);
+		aggstate->phases[0].numsets = 0;
+		aggstate->phases[0].gset_lengths = palloc(numIndexes * sizeof(int));
+		aggstate->phases[0].grouped_cols = palloc(numIndexes * sizeof(Bitmapset *));
+	}
 
 	phase = 0;
 	for (phaseidx = 0; phaseidx <= list_length(node->chain); ++phaseidx)
@@ -3514,6 +4325,18 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			aggnode = list_nth_node(Agg, node->chain, phaseidx - 1);
 			sortnode = castNode(Sort, outerPlan(aggnode));
 		}
+		else if (use_index)
+		{
+			Assert(list_length(node->chain) == 1);
+
+			aggnode = node;
+			sortnode = castNode(Sort, linitial(node->chain));
+			/* 
+			 * list contains single element, so we must adjust loop variable,
+			 * so it will be single iteration at all.
+			 */
+			phaseidx++;
+		}
 		else
 		{
 			aggnode = node;
@@ -3550,6 +4373,35 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			all_grouped_cols = bms_add_members(all_grouped_cols, cols);
 			continue;
 		}
+		else if (aggnode->aggstrategy == AGG_INDEX)
+		{
+			AggStatePerPhase phasedata = &aggstate->phases[0];
+			AggStatePerIndex perindex;
+			Bitmapset *cols;
+			
+			Assert(phase == 0);
+			Assert(sortnode);
+
+			i = phasedata->numsets++;
+			
+			/* phase 0 always points to the "real" Agg in the index case */
+			phasedata->aggnode = node;
+			phasedata->aggstrategy = node->aggstrategy;
+			phasedata->sortnode = sortnode;
+
+			perindex = &aggstate->perindex[i];
+			perindex->aggnode = aggnode;
+			aggstate->index_sort = sortnode;
+
+			phasedata->gset_lengths[i] = perindex->numKeyCols = aggnode->numCols;
+
+			cols = NULL;
+			for (j = 0; j < aggnode->numCols; ++j)
+				cols = bms_add_member(cols, aggnode->grpColIdx[j]);
+				
+			phasedata->grouped_cols[i] = cols;
+			all_grouped_cols = bms_add_members(all_grouped_cols, cols);
+		}
 		else
 		{
 			AggStatePerPhase phasedata = &aggstate->phases[++phase];
@@ -3670,7 +4522,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 									 * (numGroupingSets + numHashes));
 	pergroups = aggstate->all_pergroups;
 
-	if (node->aggstrategy != AGG_HASHED)
+	if (node->aggstrategy != AGG_HASHED && node->aggstrategy != AGG_INDEX)
 	{
 		for (i = 0; i < numGroupingSets; i++)
 		{
@@ -3685,18 +4537,15 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	/*
 	 * Hashing can only appear in the initial phase.
 	 */
-	if (use_hashing)
+	if (use_hashing || use_index)
 	{
 		Plan	   *outerplan = outerPlan(node);
 		double		totalGroups = 0;
 
-		aggstate->hash_spill_rslot = ExecInitExtraTupleSlot(estate, scanDesc,
-															&TTSOpsMinimalTuple);
-		aggstate->hash_spill_wslot = ExecInitExtraTupleSlot(estate, scanDesc,
-															&TTSOpsVirtual);
-
-		/* this is an array of pointers, not structures */
-		aggstate->hash_pergroup = pergroups;
+		aggstate->spill_rslot = ExecInitExtraTupleSlot(estate, scanDesc,
+													   &TTSOpsMinimalTuple);
+		aggstate->spill_wslot = ExecInitExtraTupleSlot(estate, scanDesc,
+													   &TTSOpsVirtual);
 
 		aggstate->hashentrysize = hash_agg_entry_size(aggstate->numtrans,
 													  outerplan->plan_width,
@@ -3711,20 +4560,115 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		for (int k = 0; k < aggstate->num_hashes; k++)
 			totalGroups += aggstate->perhash[k].aggnode->numGroups;
 
-		hash_agg_set_limits(aggstate->hashentrysize, totalGroups, 0,
-							&aggstate->hash_mem_limit,
-							&aggstate->hash_ngroups_limit,
-							&aggstate->hash_planned_partitions);
-		find_hash_columns(aggstate);
+		agg_set_limits(aggstate->hashentrysize, totalGroups, 0,
+					   &aggstate->spill_mem_limit,
+					   &aggstate->spill_ngroups_limit,
+					   &aggstate->spill_planned_partitions);
 
-		/* Skip massive memory allocation if we are just doing EXPLAIN */
-		if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-			build_hash_tables(aggstate);
+		if (use_hashing)
+		{
+			/* this is an array of pointers, not structures */
+			aggstate->hash_pergroup = pergroups;
+	
+			find_hash_columns(aggstate);
+
+			/* Skip massive memory allocation if we are just doing EXPLAIN */
+			if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+				build_hash_tables(aggstate);
+			aggstate->table_filled = false;
+		}
+		else
+		{
+			find_index_columns(aggstate);
+
+			if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+				build_index(aggstate);
+			aggstate->index_filled = false;
+		}
 
-		aggstate->table_filled = false;
 
 		/* Initialize this to 1, meaning nothing spilled, yet */
-		aggstate->hash_batches_used = 1;
+		aggstate->spill_batches_used = 1;
+	}
+
+	/* 
+	 * For index merge disk spill may be required and we perform external
+	 * merge for this purpose. But stored tuples are already projected, so
+	 * have different TupleDesc than used in-memory (inputDesc and indexDesc).
+	 */
+	if (use_index)
+	{
+		AggStatePerIndex perindex = aggstate->perindex;
+		ListCell *lc;
+		List *targetlist = aggstate->ss.ps.plan->targetlist;
+		AttrNumber *attr_mapping_tl = 
+						palloc0(sizeof(AttrNumber) * list_length(targetlist));
+		AttrNumber *keyColIdxResult;
+
+		/* 
+		 * Build grouping column attribute mapping and store it in
+		 * attr_mapping_tl.  If there is no such mapping (projected), then
+		 * InvalidAttrNumber is set, otherwise index in indexDesc column
+		 * storing this attribute.
+		 */
+		foreach (lc, targetlist)
+		{
+			TargetEntry *te = (TargetEntry *)lfirst(lc);
+			Var *group_var;
+
+			/* All grouping expressions in targetlist stored as OUTER Vars */
+			if (!IsA(te->expr, Var))
+				continue;
+			
+			group_var = (Var *)te->expr;
+			if (group_var->varno != OUTER_VAR)
+				continue;
+
+			attr_mapping_tl[foreach_current_index(lc)] = group_var->varattno;
+		}
+
+		/* Mapping is built and now create reverse mapping */
+		keyColIdxResult = palloc0(sizeof(AttrNumber) * list_length(outerPlan(node)->targetlist));
+		for (i = 0; i < list_length(targetlist); ++i)
+		{
+			AttrNumber outer_attno = attr_mapping_tl[i];
+			AttrNumber existingIdx;
+
+			if (!AttributeNumberIsValid(outer_attno))
+				continue;
+
+			existingIdx = keyColIdxResult[outer_attno - 1];
+			
+			/* attnumbers can duplicate, so use first ones */
+			if (AttributeNumberIsValid(existingIdx) && existingIdx <= outer_attno)
+				continue;
+
+			/* 
+			 * column can be referenced in query but planner can decide to
+			 * remove is from grouping.
+			 */
+			if (!bms_is_member(outer_attno, all_grouped_cols))
+				continue;
+
+			keyColIdxResult[outer_attno - 1] = i + 1;
+		}
+
+		perindex->idxKeyColIdxTL = palloc(sizeof(AttrNumber) * perindex->numKeyCols);
+		for (i = 0; i < perindex->numKeyCols; ++i)
+		{
+			AttrNumber attno = keyColIdxResult[perindex->idxKeyColIdxInput[i] - 1];
+			if (!AttributeNumberIsValid(attno))
+				elog(ERROR, "could not locate group by attributes in targetlist for index mapping");
+
+			perindex->idxKeyColIdxTL[i] = attno;
+		}
+
+		pfree(attr_mapping_tl);
+		pfree(keyColIdxResult);
+
+		perindex->mergeslot = ExecInitExtraTupleSlot(estate,
+													 aggstate->ss.ps.ps_ResultTupleDesc, 
+													 &TTSOpsMinimalTuple);
 	}
 
 	/*
@@ -3737,13 +4681,19 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	{
 		aggstate->current_phase = 0;
 		initialize_phase(aggstate, 0);
-		select_current_set(aggstate, 0, true);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_HASH);
+	}
+	else if (node->aggstrategy == AGG_INDEX)
+	{
+		aggstate->current_phase = 0;
+		initialize_phase(aggstate, 0);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_INDEX);
 	}
 	else
 	{
 		aggstate->current_phase = 1;
 		initialize_phase(aggstate, 1);
-		select_current_set(aggstate, 0, false);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_SORT);
 	}
 
 	/*
@@ -4071,8 +5021,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	for (phaseidx = 0; phaseidx < aggstate->numphases; phaseidx++)
 	{
 		AggStatePerPhase phase = &aggstate->phases[phaseidx];
-		bool		dohash = false;
-		bool		dosort = false;
+		int			strategy;
 
 		/* phase 0 doesn't necessarily exist */
 		if (!phase->aggnode)
@@ -4084,8 +5033,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			 * Phase one, and only phase one, in a mixed agg performs both
 			 * sorting and aggregation.
 			 */
-			dohash = true;
-			dosort = true;
+			strategy = GROUPING_STRATEGY_HASH | GROUPING_STRATEGY_SORT;
 		}
 		else if (aggstate->aggstrategy == AGG_MIXED && phaseidx == 0)
 		{
@@ -4099,19 +5047,20 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		else if (phase->aggstrategy == AGG_PLAIN ||
 				 phase->aggstrategy == AGG_SORTED)
 		{
-			dohash = false;
-			dosort = true;
+			strategy = GROUPING_STRATEGY_SORT;
 		}
 		else if (phase->aggstrategy == AGG_HASHED)
 		{
-			dohash = true;
-			dosort = false;
+			strategy = GROUPING_STRATEGY_HASH;
+		}
+		else if (phase->aggstrategy == AGG_INDEX)
+		{
+			strategy = GROUPING_STRATEGY_INDEX;
 		}
 		else
 			Assert(false);
 
-		phase->evaltrans = ExecBuildAggTrans(aggstate, phase, dosort, dohash,
-											 false);
+		phase->evaltrans = ExecBuildAggTrans(aggstate, phase, strategy, false);
 
 		/* cache compiled expression for outer slot without NULL check */
 		phase->evaltrans_cache[0][0] = phase->evaltrans;
@@ -4415,9 +5364,9 @@ ExecEndAgg(AggState *node)
 
 		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
 		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
-		si->hash_batches_used = node->hash_batches_used;
-		si->hash_disk_used = node->hash_disk_used;
-		si->hash_mem_peak = node->hash_mem_peak;
+		si->hash_batches_used = node->spill_batches_used;
+		si->hash_disk_used = node->spill_disk_used;
+		si->hash_mem_peak = node->spill_mem_peak;
 	}
 
 	/* Make sure we have closed any open tuplesorts */
@@ -4427,7 +5376,10 @@ ExecEndAgg(AggState *node)
 	if (node->sort_out)
 		tuplesort_end(node->sort_out);
 
-	hashagg_reset_spill_state(node);
+	if (node->aggstrategy == AGG_INDEX)
+		indexagg_reset_spill_state(node);
+	else
+		hashagg_reset_spill_state(node);
 
 	/* Release hash tables too */
 	if (node->hash_metacxt != NULL)
@@ -4440,6 +5392,26 @@ ExecEndAgg(AggState *node)
 		MemoryContextDelete(node->hash_tuplescxt);
 		node->hash_tuplescxt = NULL;
 	}
+	if (node->index_metacxt != NULL)
+	{
+		MemoryContextDelete(node->index_metacxt);
+		node->index_metacxt = NULL;
+	}
+	if (node->index_entrycxt != NULL)
+	{
+		MemoryContextDelete(node->index_entrycxt);
+		node->index_entrycxt = NULL;
+	}
+	if (node->index_nodecxt != NULL)
+	{
+		MemoryContextDelete(node->index_nodecxt);
+		node->index_nodecxt = NULL;
+	}
+	if (node->mergestate)
+	{
+		tuplesort_end(node->mergestate);
+		node->mergestate = NULL;
+	}
 
 	for (transno = 0; transno < node->numtrans; transno++)
 	{
@@ -4457,6 +5429,8 @@ ExecEndAgg(AggState *node)
 		ReScanExprContext(node->aggcontexts[setno]);
 	if (node->hashcontext)
 		ReScanExprContext(node->hashcontext);
+	if (node->indexcontext)
+		ReScanExprContext(node->indexcontext);
 
 	outerPlan = outerPlanState(node);
 	ExecEndNode(outerPlan);
@@ -4492,12 +5466,27 @@ ExecReScanAgg(AggState *node)
 		 * we can just rescan the existing hash table; no need to build it
 		 * again.
 		 */
-		if (outerPlan->chgParam == NULL && !node->hash_ever_spilled &&
+		if (outerPlan->chgParam == NULL && !node->spill_ever_happened &&
 			!bms_overlap(node->ss.ps.chgParam, aggnode->aggParams))
 		{
 			ResetTupleHashIterator(node->perhash[0].hashtable,
 								   &node->perhash[0].hashiter);
-			select_current_set(node, 0, true);
+			select_current_set(node, 0, GROUPING_STRATEGY_HASH);
+			return;
+		}
+	}
+
+	if (node->aggstrategy == AGG_INDEX)
+	{
+		if (!node->index_filled)
+			return;
+
+		if (outerPlan->chgParam == NULL && !node->spill_ever_happened &&
+			!bms_overlap(node->ss.ps.chgParam, aggnode->aggParams))
+		{
+			AggStatePerIndex perindex = node->perindex;
+			ResetTupleIndexIterator(perindex->index, &perindex->iter);
+			select_current_set(node, 0, GROUPING_STRATEGY_INDEX);
 			return;
 		}
 	}
@@ -4551,9 +5540,9 @@ ExecReScanAgg(AggState *node)
 	{
 		hashagg_reset_spill_state(node);
 
-		node->hash_ever_spilled = false;
-		node->hash_spill_mode = false;
-		node->hash_ngroups_current = 0;
+		node->spill_ever_happened = false;
+		node->spill_mode = false;
+		node->spill_ngroups_current = 0;
 
 		ReScanExprContext(node->hashcontext);
 		/* Rebuild empty hash table(s) */
@@ -4561,10 +5550,33 @@ ExecReScanAgg(AggState *node)
 		node->table_filled = false;
 		/* iterator will be reset when the table is filled */
 
-		hashagg_recompile_expressions(node, false, false);
+		agg_recompile_expressions(node, false, false);
 	}
 
-	if (node->aggstrategy != AGG_HASHED)
+	if (node->aggstrategy == AGG_INDEX)
+	{
+		indexagg_reset_spill_state(node);
+
+		node->spill_ever_happened = false;
+		node->spill_mode = false;
+		node->spill_ngroups_current = 0;
+		
+		ReScanExprContext(node->indexcontext);
+		MemoryContextReset(node->index_entrycxt);
+		MemoryContextReset(node->index_nodecxt);
+
+		build_index(node);
+		node->index_filled = false;
+
+		agg_recompile_expressions(node, false, false);
+
+		if (node->mergestate)
+		{
+			tuplesort_end(node->mergestate);
+			node->mergestate = NULL;
+		}
+	}
+	else if (node->aggstrategy != AGG_HASHED)
 	{
 		/*
 		 * Reset the per-group state (in particular, mark transvalues null)
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 5d4411dc33f..b53807ec22e 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1910,6 +1910,7 @@ static void
 inittapestate(Tuplesortstate *state, int maxTapes)
 {
 	int64		tapeSpace;
+	Size		memtuplesSize;
 
 	/*
 	 * Decrease availMem to reflect the space needed for tape buffers; but
@@ -1922,7 +1923,16 @@ inittapestate(Tuplesortstate *state, int maxTapes)
 	 */
 	tapeSpace = (int64) maxTapes * TAPE_BUFFER_OVERHEAD;
 
-	if (tapeSpace + GetMemoryChunkSpace(state->memtuples) < state->allowedMem)
+	/* 
+	 * In merge state during initial run creation we do not use in-memory
+	 * tuples array and write to tapes directly.
+	 */
+	if (state->memtuples != NULL)
+		memtuplesSize = GetMemoryChunkSpace(state->memtuples);
+	else
+		memtuplesSize = 0;
+
+	if (tapeSpace + memtuplesSize < state->allowedMem)
 		USEMEM(state, tapeSpace);
 
 	/*
@@ -2041,11 +2051,14 @@ mergeruns(Tuplesortstate *state)
 
 	/*
 	 * We no longer need a large memtuples array.  (We will allocate a smaller
-	 * one for the heap later.)
+	 * one for the heap later.)  Note that in merge state this array can be NULL.
 	 */
-	FREEMEM(state, GetMemoryChunkSpace(state->memtuples));
-	pfree(state->memtuples);
-	state->memtuples = NULL;
+	if (state->memtuples)
+	{
+		FREEMEM(state, GetMemoryChunkSpace(state->memtuples));
+		pfree(state->memtuples);
+		state->memtuples = NULL;
+	}
 
 	/*
 	 * Initialize the slab allocator.  We need one slab slot per input tape,
@@ -3167,3 +3180,189 @@ ssup_datum_int32_cmp(Datum x, Datum y, SortSupport ssup)
 	else
 		return 0;
 }
+
+/* 
+ *    tuplemerge_begin_common
+ * 
+ * Create new Tuplesortstate for performing merge only. This is used when
+ * we know, that input is sorted, but stored in multiple tapes, so only
+ * have to perform merge.
+ * 
+ * Unlike tuplesort_begin_common it does not accept sortopt, because none
+ * of current options are supported by merge (random access and bounded sort).
+ */
+Tuplesortstate *
+tuplemerge_begin_common(int workMem, SortCoordinate coordinate)
+{
+	Tuplesortstate *state;
+	MemoryContext maincontext;
+	MemoryContext sortcontext;
+	MemoryContext oldcontext;
+
+	/*
+	 * Memory context surviving tuplesort_reset.  This memory context holds
+	 * data which is useful to keep while sorting multiple similar batches.
+	 */
+	maincontext = AllocSetContextCreate(CurrentMemoryContext,
+										"TupleMerge main",
+										ALLOCSET_DEFAULT_SIZES);
+
+	/*
+	 * Create a working memory context for one sort operation.  The content of
+	 * this context is deleted by tuplesort_reset.
+	 */
+	sortcontext = AllocSetContextCreate(maincontext,
+										"TupleMerge merge",
+										ALLOCSET_DEFAULT_SIZES);
+
+	/*
+	 * Make the Tuplesortstate within the per-sortstate context.  This way, we
+	 * don't need a separate pfree() operation for it at shutdown.
+	 */
+	oldcontext = MemoryContextSwitchTo(maincontext);
+
+	state = (Tuplesortstate *) palloc0(sizeof(Tuplesortstate));
+
+	if (trace_sort)
+		pg_rusage_init(&state->ru_start);
+
+	state->base.sortopt = TUPLESORT_NONE;
+	state->base.tuples = true;
+	state->abbrevNext = 10;
+
+	/*
+	 * workMem is forced to be at least 64KB, the current minimum valid value
+	 * for the work_mem GUC.  This is a defense against parallel sort callers
+	 * that divide out memory among many workers in a way that leaves each
+	 * with very little memory.
+	 */
+	state->allowedMem = Max(workMem, 64) * (int64) 1024;
+	state->base.sortcontext = sortcontext;
+	state->base.maincontext = maincontext;
+
+	/*
+	 * After all of the other non-parallel-related state, we setup all of the
+	 * state needed for each batch.
+	 */
+
+	/* 
+	 * Merging do not accept RANDOMACCESS, so only possible context is Bump,
+	 * which saves some cycles.
+	 */
+	state->base.tuplecontext = BumpContextCreate(state->base.sortcontext,
+												 "Caller tuples",
+												 ALLOCSET_DEFAULT_SIZES);
+	
+	state->status = TSS_BUILDRUNS;
+	state->bounded = false;
+	state->boundUsed = false;
+	state->availMem = state->allowedMem;
+	
+	/* 
+	 * When performing merge we do not need in-memory array for sorting.
+	 * Even if we do not use memtuples, still allocate it, but make it empty.
+	 * So if someone will invoke inappropriate function in merge mode we will
+	 * not fail.
+	 */
+	state->memtuples = NULL;
+	state->memtupcount = 0;
+	state->memtupsize = INITIAL_MEMTUPSIZE;
+	state->growmemtuples = true;
+	state->slabAllocatorUsed = false;
+
+	/*
+	 * Tape variables (inputTapes, outputTapes, etc.) will be initialized by
+	 * inittapes(), if needed.
+	 */
+	state->result_tape = NULL;	/* flag that result tape has not been formed */
+	state->tapeset = NULL;
+	
+	inittapes(state, true);
+
+	/*
+	 * Initialize parallel-related state based on coordination information
+	 * from caller
+	 */
+	if (!coordinate)
+	{
+		/* Serial sort */
+		state->shared = NULL;
+		state->worker = -1;
+		state->nParticipants = -1;
+	}
+	else if (coordinate->isWorker)
+	{
+		/* Parallel worker produces exactly one final run from all input */
+		state->shared = coordinate->sharedsort;
+		state->worker = worker_get_identifier(state);
+		state->nParticipants = -1;
+	}
+	else
+	{
+		/* Parallel leader state only used for final merge */
+		state->shared = coordinate->sharedsort;
+		state->worker = -1;
+		state->nParticipants = coordinate->nParticipants;
+		Assert(state->nParticipants >= 1);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+void
+tuplemerge_start_run(Tuplesortstate *state)
+{
+	if (state->memtupcount == 0)
+		return;
+
+	selectnewtape(state);
+	state->memtupcount = 0;
+}
+
+void
+tuplemerge_performmerge(Tuplesortstate *state)
+{
+	if (state->memtupcount == 0)
+	{
+		/* 
+		 * We have started new run, but no tuples were written. mergeruns
+		 * expects that each run have at least 1 tuple, otherwise it
+		 * will fail to even fill initial merge heap.
+		 */
+		state->nOutputRuns--;
+	}
+	else
+		state->memtupcount = 0;
+
+	mergeruns(state);
+
+	state->current = 0;
+	state->eof_reached = false;
+	state->markpos_block = 0L;
+	state->markpos_offset = 0;
+	state->markpos_eof = false;
+}
+
+void
+tuplemerge_puttuple_common(Tuplesortstate *state, SortTuple *tuple, Size tuplen)
+{
+	MemoryContext oldcxt = MemoryContextSwitchTo(state->base.sortcontext);
+
+	Assert(state->destTape);	
+	WRITETUP(state, state->destTape, tuple);
+
+	MemoryContextSwitchTo(oldcxt);
+	
+	state->memtupcount++;
+}
+
+void
+tuplemerge_end_run(Tuplesortstate *state)
+{
+	if (state->memtupcount != 0)
+	{
+		markrunend(state->destTape);
+	}
+}
diff --git a/src/backend/utils/sort/tuplesortvariants.c b/src/backend/utils/sort/tuplesortvariants.c
index 9751a7fc495..e21b9c33ac2 100644
--- a/src/backend/utils/sort/tuplesortvariants.c
+++ b/src/backend/utils/sort/tuplesortvariants.c
@@ -2071,3 +2071,108 @@ readtup_datum(Tuplesortstate *state, SortTuple *stup,
 	if (base->sortopt & TUPLESORT_RANDOMACCESS) /* need trailing length word? */
 		LogicalTapeReadExact(tape, &tuplen, sizeof(tuplen));
 }
+
+Tuplesortstate *
+tuplemerge_begin_heap(TupleDesc tupDesc,
+					  int nkeys, AttrNumber *attNums,
+					  Oid *sortOperators, Oid *sortCollations,
+					  bool *nullsFirstFlags,
+					  int workMem, SortCoordinate coordinate)
+{
+	Tuplesortstate *state = tuplemerge_begin_common(workMem, coordinate);
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	MemoryContext oldcontext;
+	int			i;
+
+	oldcontext = MemoryContextSwitchTo(base->maincontext);
+
+	Assert(nkeys > 0);
+
+	if (trace_sort)
+		elog(LOG,
+			 "begin tuple merge: nkeys = %d, workMem = %d", nkeys, workMem);
+
+	base->nKeys = nkeys;
+
+	TRACE_POSTGRESQL_SORT_START(HEAP_SORT,
+								false,	/* no unique check */
+								nkeys,
+								workMem,
+								false,
+								PARALLEL_SORT(coordinate));
+
+	base->removeabbrev = removeabbrev_heap;
+	base->comparetup = comparetup_heap;
+	base->comparetup_tiebreak = comparetup_heap_tiebreak;
+	base->writetup = writetup_heap;
+	base->readtup = readtup_heap;
+	base->haveDatum1 = true;
+	base->arg = tupDesc;		/* assume we need not copy tupDesc */
+
+	/* Prepare SortSupport data for each column */
+	base->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		SortSupport sortKey = base->sortKeys + i;
+
+		Assert(attNums[i] != 0);
+		Assert(sortOperators[i] != 0);
+
+		sortKey->ssup_cxt = CurrentMemoryContext;
+		sortKey->ssup_collation = sortCollations[i];
+		sortKey->ssup_nulls_first = nullsFirstFlags[i];
+		sortKey->ssup_attno = attNums[i];
+		/* Convey if abbreviation optimization is applicable in principle */
+		sortKey->abbreviate = (i == 0 && base->haveDatum1);
+
+		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
+	}
+
+	/*
+	 * The "onlyKey" optimization cannot be used with abbreviated keys, since
+	 * tie-breaker comparisons may be required.  Typically, the optimization
+	 * is only of value to pass-by-value types anyway, whereas abbreviated
+	 * keys are typically only of value to pass-by-reference types.
+	 */
+	if (nkeys == 1 && !base->sortKeys->abbrev_converter)
+		base->onlyKey = base->sortKeys;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+void
+tuplemerge_puttupleslot(Tuplesortstate *state, TupleTableSlot *slot)
+{
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	MemoryContext oldcontext = MemoryContextSwitchTo(base->tuplecontext);
+	TupleDesc	tupDesc = (TupleDesc) base->arg;
+	SortTuple	stup;
+	MinimalTuple tuple;
+	HeapTupleData htup;
+	Size		tuplen;
+
+	/* copy the tuple into sort storage */
+	tuple = ExecCopySlotMinimalTuple(slot);
+	stup.tuple = tuple;
+	/* set up first-column key value */
+	htup.t_len = tuple->t_len + MINIMAL_TUPLE_OFFSET;
+	htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET);
+	stup.datum1 = heap_getattr(&htup,
+							   base->sortKeys[0].ssup_attno,
+							   tupDesc,
+							   &stup.isnull1);
+
+	/* GetMemoryChunkSpace is not supported for bump contexts */
+	if (TupleSortUseBumpTupleCxt(base->sortopt))
+		tuplen = MAXALIGN(tuple->t_len);
+	else
+		tuplen = GetMemoryChunkSpace(tuple);
+
+	tuplemerge_puttuple_common(state, &stup, tuplen);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 6192cc8d143..7c9efe77ab9 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -393,8 +393,16 @@ extern ExprState *ExecInitExprWithParams(Expr *node, ParamListInfo ext_params);
 extern ExprState *ExecInitQual(List *qual, PlanState *parent);
 extern ExprState *ExecInitCheck(List *qual, PlanState *parent);
 extern List *ExecInitExprList(List *nodes, PlanState *parent);
+
+/* 
+ * Which strategy to use for aggregation/grouping
+ */
+#define GROUPING_STRATEGY_SORT			1
+#define GROUPING_STRATEGY_HASH			(1 << 1)
+#define GROUPING_STRATEGY_INDEX			(1 << 2)
+
 extern ExprState *ExecBuildAggTrans(AggState *aggstate, struct AggStatePerPhaseData *phase,
-									bool doSort, bool doHash, bool nullcheck);
+									int groupStrategy, bool nullcheck);
 extern ExprState *ExecBuildHash32FromAttrs(TupleDesc desc,
 										   const TupleTableSlotOps *ops,
 										   FmgrInfo *hashfunctions,
diff --git a/src/include/executor/nodeAgg.h b/src/include/executor/nodeAgg.h
index 6c4891bbaeb..8361d000878 100644
--- a/src/include/executor/nodeAgg.h
+++ b/src/include/executor/nodeAgg.h
@@ -321,6 +321,33 @@ typedef struct AggStatePerHashData
 	Agg		   *aggnode;		/* original Agg node, for numGroups etc. */
 }			AggStatePerHashData;
 
+/* 
+ * AggStatePerIndexData - per-index state
+ *
+ * Logic is the same as for AggStatePerHashData - one of these for each
+ * grouping set.
+ */
+typedef struct AggStatePerIndexData
+{
+	TupleIndex	index;			/* current in-memory index data */
+	MemoryContext metacxt;		/* memory context containing TupleIndex */
+	MemoryContext tempctx;		/* short-lived context */
+	TupleTableSlot *indexslot; 	/* slot for loading index */
+	int			numCols;		/* total number of columns in index tuple */
+	int			numKeyCols;		/* number of key columns in index tuple */
+	int			largestGrpColIdx;	/* largest col required for comparison */
+	AttrNumber *idxKeyColIdxInput;	/* key column indices in input slot */
+	AttrNumber *idxKeyColIdxIndex;	/* key column indices in index tuples */
+	TupleIndexIteratorData iter;	/* iterator state for index */
+	Agg		   *aggnode;		/* original Agg node, for numGroups etc. */	
+
+	/* state used only for spill mode */
+	AttrNumber	*idxKeyColIdxTL;	/* key column indices in target list */
+	FmgrInfo    *hashfunctions;	/* tuple hashing function */
+	ExprState   *indexhashexpr;	/* ExprState for hashing index datatype(s) */
+	ExprContext *exprcontext;	/* expression context */
+	TupleTableSlot *mergeslot;	/* slot for loading tuple during merge */
+}			AggStatePerIndexData;
 
 extern AggState *ExecInitAgg(Agg *node, EState *estate, int eflags);
 extern void ExecEndAgg(AggState *node);
@@ -328,9 +355,9 @@ extern void ExecReScanAgg(AggState *node);
 
 extern Size hash_agg_entry_size(int numTrans, Size tupleWidth,
 								Size transitionSpace);
-extern void hash_agg_set_limits(double hashentrysize, double input_groups,
-								int used_bits, Size *mem_limit,
-								uint64 *ngroups_limit, int *num_partitions);
+extern void agg_set_limits(double hashentrysize, double input_groups,
+						   int used_bits, Size *mem_limit,
+						   uint64 *ngroups_limit, int *num_partitions);
 
 /* parallel instrumentation support */
 extern void ExecAggEstimate(AggState *node, ParallelContext *pcxt);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 99ee472b51f..3bba2359e11 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2613,6 +2613,7 @@ typedef struct AggStatePerTransData *AggStatePerTrans;
 typedef struct AggStatePerGroupData *AggStatePerGroup;
 typedef struct AggStatePerPhaseData *AggStatePerPhase;
 typedef struct AggStatePerHashData *AggStatePerHash;
+typedef struct AggStatePerIndexData *AggStatePerIndex;
 
 typedef struct AggState
 {
@@ -2628,17 +2629,18 @@ typedef struct AggState
 	AggStatePerAgg peragg;		/* per-Aggref information */
 	AggStatePerTrans pertrans;	/* per-Trans state information */
 	ExprContext *hashcontext;	/* econtexts for long-lived data (hashtable) */
+	ExprContext *indexcontext;	/* econtexts for long-lived data (index) */
 	ExprContext **aggcontexts;	/* econtexts for long-lived data (per GS) */
 	ExprContext *tmpcontext;	/* econtext for input expressions */
-#define FIELDNO_AGGSTATE_CURAGGCONTEXT 14
+#define FIELDNO_AGGSTATE_CURAGGCONTEXT 15
 	ExprContext *curaggcontext; /* currently active aggcontext */
 	AggStatePerAgg curperagg;	/* currently active aggregate, if any */
-#define FIELDNO_AGGSTATE_CURPERTRANS 16
+#define FIELDNO_AGGSTATE_CURPERTRANS 17
 	AggStatePerTrans curpertrans;	/* currently active trans state, if any */
 	bool		input_done;		/* indicates end of input */
 	bool		agg_done;		/* indicates completion of Agg scan */
 	int			projected_set;	/* The last projected grouping set */
-#define FIELDNO_AGGSTATE_CURRENT_SET 20
+#define FIELDNO_AGGSTATE_CURRENT_SET 21
 	int			current_set;	/* The current grouping set being evaluated */
 	Bitmapset  *grouped_cols;	/* grouped cols in current projection */
 	List	   *all_grouped_cols;	/* list of all grouped cols in DESC order */
@@ -2660,32 +2662,43 @@ typedef struct AggState
 	int			num_hashes;
 	MemoryContext hash_metacxt; /* memory for hash table bucket array */
 	MemoryContext hash_tuplescxt;	/* memory for hash table tuples */
-	struct LogicalTapeSet *hash_tapeset;	/* tape set for hash spill tapes */
-	struct HashAggSpill *hash_spills;	/* HashAggSpill for each grouping set,
-										 * exists only during first pass */
-	TupleTableSlot *hash_spill_rslot;	/* for reading spill files */
-	TupleTableSlot *hash_spill_wslot;	/* for writing spill files */
-	List	   *hash_batches;	/* hash batches remaining to be processed */
-	bool		hash_ever_spilled;	/* ever spilled during this execution? */
-	bool		hash_spill_mode;	/* we hit a limit during the current batch
-									 * and we must not create new groups */
-	Size		hash_mem_limit; /* limit before spilling hash table */
-	uint64		hash_ngroups_limit; /* limit before spilling hash table */
-	int			hash_planned_partitions;	/* number of partitions planned
-											 * for first pass */
-	double		hashentrysize;	/* estimate revised during execution */
-	Size		hash_mem_peak;	/* peak hash table memory usage */
-	uint64		hash_ngroups_current;	/* number of groups currently in
-										 * memory in all hash tables */
-	uint64		hash_disk_used; /* kB of disk space used */
-	int			hash_batches_used;	/* batches used during entire execution */
-
 	AggStatePerHash perhash;	/* array of per-hashtable data */
 	AggStatePerGroup *hash_pergroup;	/* grouping set indexed array of
 										 * per-group pointers */
+	/* Fields used for managing spill mode in hash and index aggs */
+	struct LogicalTapeSet *spill_tapeset;	/* tape set for hash spill tapes */
+	struct HashAggSpill *spills;	/* HashAggSpill for each grouping set,
+									 * exists only during first pass */
+	TupleTableSlot *spill_rslot;	/* for reading spill files */
+	TupleTableSlot *spill_wslot;	/* for writing spill files */
+	List	   *spill_batches;	/* hash batches remaining to be processed */
+
+	bool		spill_ever_happened;	/* ever spilled during this execution? */
+	bool		spill_mode;	/* we hit a limit during the current batch
+							 * and we must not create new groups */
+	Size		spill_mem_limit; /* limit before spilling hash table or index */
+	uint64		spill_ngroups_limit; /* limit before spilling hash table or index */
+	int			spill_planned_partitions;	/* number of partitions planned
+											 * for first pass */
+	double		hashentrysize;	/* estimate revised during execution */
+	Size		spill_mem_peak;	/* peak memory usage of hash table or index */
+	uint64		spill_ngroups_current;	/* number of groups currently in
+										 * memory in all hash tables */
+	uint64		spill_disk_used; /* kB of disk space used */
+	int			spill_batches_used;	/* batches used during entire execution */
+
+	/* these fields are used in AGG_INDEXED mode: */
+	AggStatePerIndex perindex;	/* pointer to per-index state data */
+	bool			index_filled;	/* index filled yet? */
+	MemoryContext	index_metacxt;	/* memory for index structure */
+	MemoryContext	index_nodecxt;	/* memory for index nodes */
+	MemoryContext	index_entrycxt;	/* memory for index entries */
+	Sort		   *index_sort;		/* ordering information for index */
+	Tuplesortstate *mergestate;		/* state for merging projected tuples if
+									 * spill occurred */
 
 	/* support for evaluation of agg input expressions: */
-#define FIELDNO_AGGSTATE_ALL_PERGROUPS 54
+#define FIELDNO_AGGSTATE_ALL_PERGROUPS 62
 	AggStatePerGroup *all_pergroups;	/* array of first ->pergroups, than
 										 * ->hash_pergroup */
 	SharedAggInfo *shared_info; /* one entry per worker */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fb3957e75e5..b0e2d781c01 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -365,6 +365,7 @@ typedef enum AggStrategy
 	AGG_SORTED,					/* grouped agg, input must be sorted */
 	AGG_HASHED,					/* grouped agg, use internal hashtable */
 	AGG_MIXED,					/* grouped agg, hash and sort both used */
+	AGG_INDEX,					/* grouped agg, build index for input */
 } AggStrategy;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..b19dacf5de4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1219,7 +1219,7 @@ typedef struct Agg
 	/* grouping sets to use */
 	List	   *groupingSets;
 
-	/* chained Agg/Sort nodes */
+	/* chained Agg/Sort nodes, for AGG_INDEX contains single Sort node */
 	List	   *chain;
 } Agg;
 
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 0bf55902aa1..f372c3e7e0a 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -475,6 +475,21 @@ extern GinTuple *tuplesort_getgintuple(Tuplesortstate *state, Size *len,
 									   bool forward);
 extern bool tuplesort_getdatum(Tuplesortstate *state, bool forward, bool copy,
 							   Datum *val, bool *isNull, Datum *abbrev);
-
+/* 
+* Special state for merge mode.
+*/
+extern Tuplesortstate *tuplemerge_begin_common(int workMem,
+											   SortCoordinate coordinate);
+extern Tuplesortstate *tuplemerge_begin_heap(TupleDesc tupDesc,
+											int nkeys, AttrNumber *attNums,
+											Oid *sortOperators, Oid *sortCollations,
+											bool *nullsFirstFlags,
+											int workMem, SortCoordinate coordinate);
+extern void tuplemerge_start_run(Tuplesortstate *state);
+extern void tuplemerge_end_run(Tuplesortstate *state);
+extern void tuplemerge_puttuple_common(Tuplesortstate *state, SortTuple *tuple,
+									   Size tuplen);
+extern void tuplemerge_puttupleslot(Tuplesortstate *state, TupleTableSlot *slot);
+extern void tuplemerge_performmerge(Tuplesortstate *state);
 
 #endif							/* TUPLESORT_H */
-- 
2.43.0

0003-make-use-of-IndexAggregate-in-planner-and-explain.patchtext/x-patch; charset=UTF-8; name=0003-make-use-of-IndexAggregate-in-planner-and-explain.patchDownload

From d1e5ec099977a922a253f15ea49052076ca7ca1e Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 17:34:18 +0300
Subject: [PATCH 3/4] make use of IndexAggregate in planner and explain

This commit adds usage of IndexAggregate in planner and explain (analyze).

We calculate cost of IndexAggregate and add AGG_INDEX node to the pathlist.
Cost of this node is cost of building B+tree (in memory), disk spill and
final external merge.

For EXPLAIN there is only little change - show sort information in "Group Key".
---
 src/backend/commands/explain.c            | 101 ++++++++++++++++++----
 src/backend/optimizer/path/costsize.c     |  90 +++++++++++++++----
 src/backend/optimizer/plan/createplan.c   |  15 +++-
 src/backend/optimizer/plan/planner.c      |  35 ++++++++
 src/backend/optimizer/util/pathnode.c     |   9 ++
 src/backend/utils/misc/guc_parameters.dat |   7 ++
 src/include/nodes/pathnodes.h             |   3 +-
 src/include/optimizer/cost.h              |   1 +
 8 files changed, 222 insertions(+), 39 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7e699f8595e..83da2bc4a94 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -134,7 +134,7 @@ static void show_recursive_union_info(RecursiveUnionState *rstate,
 									  ExplainState *es);
 static void show_memoize_info(MemoizeState *mstate, List *ancestors,
 							  ExplainState *es);
-static void show_hashagg_info(AggState *aggstate, ExplainState *es);
+static void show_agg_spill_info(AggState *aggstate, ExplainState *es);
 static void show_indexsearches_info(PlanState *planstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1556,6 +1556,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 						pname = "MixedAggregate";
 						strategy = "Mixed";
 						break;
+					case AGG_INDEX:
+						pname = "IndexAggregate";
+						strategy = "Indexed";
+						break;
 					default:
 						pname = "Aggregate ???";
 						strategy = "???";
@@ -2200,7 +2204,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Agg:
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
-			show_hashagg_info((AggState *) planstate, es);
+			show_agg_spill_info((AggState *) planstate, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
@@ -2631,6 +2635,24 @@ show_agg_keys(AggState *astate, List *ancestors,
 
 		if (plan->groupingSets)
 			show_grouping_sets(outerPlanState(astate), plan, ancestors, es);
+		else if (plan->aggstrategy == AGG_INDEX)
+			{
+				Sort	*sort = astate->index_sort;
+
+				/* 
+				 * Index Agg reorders GROUP BY keys to match ORDER BY
+				 * so they must be the same, but we should show other
+				 * useful information about used ordering, such as direction.
+				 */
+				Assert(sort != NULL);
+				show_sort_group_keys(outerPlanState(astate), "Group Key",
+									 plan->numCols, 0,
+									 sort->sortColIdx,
+									 sort->sortOperators,
+									 sort->collations,
+									 sort->nullsFirst,
+									 ancestors, es);
+			}
 		else
 			show_sort_group_keys(outerPlanState(astate), "Group Key",
 								 plan->numCols, 0, plan->grpColIdx,
@@ -3735,47 +3757,67 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es)
 }
 
 /*
- * Show information on hash aggregate memory usage and batches.
+ * Show information on hash or index aggregate memory usage and batches.
  */
 static void
-show_hashagg_info(AggState *aggstate, ExplainState *es)
+show_agg_spill_info(AggState *aggstate, ExplainState *es)
 {
 	Agg		   *agg = (Agg *) aggstate->ss.ps.plan;
-	int64		memPeakKb = BYTES_TO_KILOBYTES(aggstate->hash_mem_peak);
+	int64		memPeakKb = BYTES_TO_KILOBYTES(aggstate->spill_mem_peak);
 
 	if (agg->aggstrategy != AGG_HASHED &&
-		agg->aggstrategy != AGG_MIXED)
+		agg->aggstrategy != AGG_MIXED &&
+		agg->aggstrategy != AGG_INDEX)
 		return;
 
 	if (es->format != EXPLAIN_FORMAT_TEXT)
 	{
 		if (es->costs)
 			ExplainPropertyInteger("Planned Partitions", NULL,
-								   aggstate->hash_planned_partitions, es);
+								   aggstate->spill_planned_partitions, es);
 
 		/*
 		 * During parallel query the leader may have not helped out.  We
 		 * detect this by checking how much memory it used.  If we find it
 		 * didn't do any work then we don't show its properties.
 		 */
-		if (es->analyze && aggstate->hash_mem_peak > 0)
+		if (es->analyze && aggstate->spill_mem_peak > 0)
 		{
 			ExplainPropertyInteger("HashAgg Batches", NULL,
-								   aggstate->hash_batches_used, es);
+								   aggstate->spill_batches_used, es);
 			ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
 			ExplainPropertyInteger("Disk Usage", "kB",
-								   aggstate->hash_disk_used, es);
+								   aggstate->spill_disk_used, es);
+		}
+
+		if (   es->analyze
+			&& aggstate->aggstrategy == AGG_INDEX
+			&& aggstate->mergestate != NULL)
+		{
+			TuplesortInstrumentation stats;
+			const char *mergeMethod;
+			const char *spaceType;
+			int64 spaceUsed;
+			
+			tuplesort_get_stats(aggstate->mergestate, &stats);
+			mergeMethod = tuplesort_method_name(stats.sortMethod);
+			spaceType = tuplesort_space_type_name(stats.spaceType);
+			spaceUsed = stats.spaceUsed;
+
+			ExplainPropertyText("Merge Method", mergeMethod, es);
+			ExplainPropertyInteger("Merge Space Used", "kB", stats.spaceUsed, es);
+			ExplainPropertyText("Merge Space Type", spaceType, es);
 		}
 	}
 	else
 	{
 		bool		gotone = false;
 
-		if (es->costs && aggstate->hash_planned_partitions > 0)
+		if (es->costs && aggstate->spill_planned_partitions > 0)
 		{
 			ExplainIndentText(es);
 			appendStringInfo(es->str, "Planned Partitions: %d",
-							 aggstate->hash_planned_partitions);
+							 aggstate->spill_planned_partitions);
 			gotone = true;
 		}
 
@@ -3784,7 +3826,7 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 		 * detect this by checking how much memory it used.  If we find it
 		 * didn't do any work then we don't show its properties.
 		 */
-		if (es->analyze && aggstate->hash_mem_peak > 0)
+		if (es->analyze && aggstate->spill_mem_peak > 0)
 		{
 			if (!gotone)
 				ExplainIndentText(es);
@@ -3792,17 +3834,44 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 				appendStringInfoSpaces(es->str, 2);
 
 			appendStringInfo(es->str, "Batches: %d  Memory Usage: " INT64_FORMAT "kB",
-							 aggstate->hash_batches_used, memPeakKb);
+							 aggstate->spill_batches_used, memPeakKb);
 			gotone = true;
 
 			/* Only display disk usage if we spilled to disk */
-			if (aggstate->hash_batches_used > 1)
+			if (aggstate->spill_batches_used > 1)
 			{
 				appendStringInfo(es->str, "  Disk Usage: " UINT64_FORMAT "kB",
-								 aggstate->hash_disk_used);
+								 aggstate->spill_disk_used);
 			}
 		}
 
+		/* For index aggregate show stats for final merging */
+		if (   es->analyze
+			&& aggstate->aggstrategy == AGG_INDEX
+			&& aggstate->mergestate != NULL)
+		{
+			TuplesortInstrumentation stats;
+			const char *mergeMethod;
+			const char *spaceType;
+			int64 spaceUsed;
+			
+			tuplesort_get_stats(aggstate->mergestate, &stats);
+			mergeMethod = tuplesort_method_name(stats.sortMethod);
+			spaceType = tuplesort_space_type_name(stats.spaceType);
+			spaceUsed = stats.spaceUsed;
+
+			/* 
+			 * If we are here that means that previous check (for mem peak) was
+			 * successfull (can not directly go to merge without any in-memory
+			 * operations).  Do not check other state and just start a new line.
+			 */
+			appendStringInfoChar(es->str, '\n');
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Merge Method: %s  %s: " INT64_FORMAT "kB",
+							 mergeMethod, spaceType, spaceUsed);
+			gotone = true;
+		}
+
 		if (gotone)
 			appendStringInfoChar(es->str, '\n');
 	}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 5a7283bd2f5..21db1746a41 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -150,6 +150,7 @@ bool		enable_tidscan = true;
 bool		enable_sort = true;
 bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
+bool		enable_indexagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
 bool		enable_memoize = true;
@@ -1848,6 +1849,32 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
 									rterm->pathtarget->width);
 }
 
+/* 
+ * cost_tuplemerge
+ *		Determines and returns the cost of external merge used in tuplesort.
+ */
+static void
+cost_tuplemerge(double availMem, double input_bytes, double ntuples,
+				Cost comparison_cost, Cost *cost)
+{
+	double		npages = ceil(input_bytes / BLCKSZ);
+	double		nruns = input_bytes / availMem;
+	double		mergeorder = tuplesort_merge_order(availMem);
+	double		log_runs;
+	double		npageaccesses;
+
+	/* Compute logM(r) as log(r) / log(M) */
+	if (nruns > mergeorder)
+		log_runs = ceil(log(nruns) / log(mergeorder));
+	else
+		log_runs = 1.0;
+
+	npageaccesses = 2.0 * npages * log_runs;
+
+	/* Assume 3/4ths of accesses are sequential, 1/4th are not */
+	*cost += npageaccesses * (seq_page_cost * 0.75 + random_page_cost * 0.25);
+}
+
 /*
  * cost_tuplesort
  *	  Determines and returns the cost of sorting a relation using tuplesort,
@@ -1922,11 +1949,6 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		/*
 		 * We'll have to use a disk-based sort of all the tuples
 		 */
-		double		npages = ceil(input_bytes / BLCKSZ);
-		double		nruns = input_bytes / sort_mem_bytes;
-		double		mergeorder = tuplesort_merge_order(sort_mem_bytes);
-		double		log_runs;
-		double		npageaccesses;
 
 		/*
 		 * CPU costs
@@ -1936,16 +1958,8 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		*startup_cost = comparison_cost * tuples * LOG2(tuples);
 
 		/* Disk costs */
-
-		/* Compute logM(r) as log(r) / log(M) */
-		if (nruns > mergeorder)
-			log_runs = ceil(log(nruns) / log(mergeorder));
-		else
-			log_runs = 1.0;
-		npageaccesses = 2.0 * npages * log_runs;
-		/* Assume 3/4ths of accesses are sequential, 1/4th are not */
-		*startup_cost += npageaccesses *
-			(seq_page_cost * 0.75 + random_page_cost * 0.25);
+		cost_tuplemerge(sort_mem_bytes, input_bytes, tuples, comparison_cost,
+						startup_cost);
 	}
 	else if (tuples > 2 * output_tuples || input_bytes > sort_mem_bytes)
 	{
@@ -2770,7 +2784,7 @@ cost_agg(Path *path, PlannerInfo *root,
 		total_cost += cpu_tuple_cost * numGroups;
 		output_tuples = numGroups;
 	}
-	else
+	else if (aggstrategy == AGG_HASHED)
 	{
 		/* must be AGG_HASHED */
 		startup_cost = input_total_cost;
@@ -2788,6 +2802,27 @@ cost_agg(Path *path, PlannerInfo *root,
 		total_cost += cpu_tuple_cost * numGroups;
 		output_tuples = numGroups;
 	}
+	else
+	{
+		/* must be AGG_INDEX */
+		startup_cost = input_total_cost;
+		if (!enable_indexagg)
+			++disabled_nodes;
+
+		startup_cost += aggcosts->transCost.startup;
+		startup_cost += aggcosts->transCost.per_tuple * input_tuples;
+		/* cost of btree building */
+		startup_cost +=   (2.0 * cpu_operator_cost * numGroupCols) /* comparison cost */
+						* LOG2(numGroups)	/* tree height/number of comparisons */
+						* input_tuples;		/* amount of tuples */
+		startup_cost += aggcosts->finalCost.startup;
+
+		total_cost = startup_cost;
+		total_cost += aggcosts->finalCost.per_tuple * numGroups;
+		/* cost of retrieving from hash table */
+		total_cost += cpu_tuple_cost * numGroups;
+		output_tuples = numGroups;
+	}
 
 	/*
 	 * Add the disk costs of hash aggregation that spills to disk.
@@ -2802,7 +2837,7 @@ cost_agg(Path *path, PlannerInfo *root,
 	 * Accrue writes (spilled tuples) to startup_cost and to total_cost;
 	 * accrue reads only to total_cost.
 	 */
-	if (aggstrategy == AGG_HASHED || aggstrategy == AGG_MIXED)
+	if (aggstrategy == AGG_HASHED || aggstrategy == AGG_MIXED || aggstrategy == AGG_INDEX)
 	{
 		double		pages;
 		double		pages_written = 0.0;
@@ -2823,8 +2858,8 @@ cost_agg(Path *path, PlannerInfo *root,
 		hashentrysize = hash_agg_entry_size(list_length(root->aggtransinfos),
 											input_width,
 											aggcosts->transitionSpace);
-		hash_agg_set_limits(hashentrysize, numGroups, 0, &mem_limit,
-							&ngroups_limit, &num_partitions);
+		agg_set_limits(hashentrysize, numGroups, 0, &mem_limit,
+					   &ngroups_limit, &num_partitions);
 
 		nbatches = Max((numGroups * hashentrysize) / mem_limit,
 					   numGroups / ngroups_limit);
@@ -2861,6 +2896,23 @@ cost_agg(Path *path, PlannerInfo *root,
 		spill_cost = depth * input_tuples * 2.0 * cpu_tuple_cost;
 		startup_cost += spill_cost;
 		total_cost += spill_cost;
+
+		/* 
+		 * Index agg also writes sorted runs on tape for futher merging.
+		 */
+		if (aggstrategy == AGG_INDEX)
+		{
+			double	output_bytes;
+			Cost	comparison_cost;
+			
+			/* size of all projected tuples */
+			output_bytes = path->pathtarget->width * output_tuples;
+			/* default comparison cost */
+			comparison_cost = 2.0 * cpu_operator_cost;
+
+			cost_tuplemerge(work_mem, output_bytes, output_tuples,
+							comparison_cost, &startup_cost);
+		}
 	}
 
 	/*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8af091ba647..08ff0f002be 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2158,6 +2158,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
 	Plan	   *subplan;
 	List	   *tlist;
 	List	   *quals;
+	List	   *chain;
+	AttrNumber *grpColIdx;
 
 	/*
 	 * Agg can project, so no need to be terribly picky about child tlist, but
@@ -2169,17 +2171,24 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
 
 	quals = order_qual_clauses(root, best_path->qual);
 
+	grpColIdx = extract_grouping_cols(best_path->groupClause, subplan->targetlist);
+
+	/* For index aggregation we should consider the desired sorting order. */
+	if (best_path->aggstrategy == AGG_INDEX)
+		chain = list_make1(make_sort_from_groupcols(best_path->groupClause, grpColIdx, subplan));
+	else
+		chain = NIL;
+
 	plan = make_agg(tlist, quals,
 					best_path->aggstrategy,
 					best_path->aggsplit,
 					list_length(best_path->groupClause),
-					extract_grouping_cols(best_path->groupClause,
-										  subplan->targetlist),
+					grpColIdx,
 					extract_grouping_ops(best_path->groupClause),
 					extract_grouping_collations(best_path->groupClause,
 												subplan->targetlist),
 					NIL,
-					NIL,
+					chain,
 					best_path->numGroups,
 					best_path->transitionSpace,
 					subplan);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0e78628bf01..3ca89def817 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3877,6 +3877,21 @@ create_grouping_paths(PlannerInfo *root,
 			 (gd ? gd->any_hashable : grouping_is_hashable(root->processed_groupClause))))
 			flags |= GROUPING_CAN_USE_HASH;
 
+		/* 
+		 * Determine whether we should consider index-based implementation of
+		 * grouping.
+		 * 
+		 * This is more restrictive since it not only must be sortable (for
+		 * purposes of Btree), but also must be hashable, so we can effectively
+		 * spill tuples and later process each batch.
+		 */
+		if (   gd == NULL
+			&& root->numOrderedAggs == 0
+			&& parse->groupClause != NIL
+			&& grouping_is_sortable(root->processed_groupClause)
+			&& grouping_is_hashable(root->processed_groupClause))
+			flags |= GROUPING_CAN_USE_INDEX;
+
 		/*
 		 * Determine whether partial aggregation is possible.
 		 */
@@ -7108,6 +7123,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+	bool		can_index = (extra->flags & GROUPING_CAN_USE_INDEX) != 0;
 	List	   *havingQual = (List *) extra->havingQual;
 	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 	double		dNumGroups = 0;
@@ -7329,6 +7345,25 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		}
 	}
 
+	if (can_index)
+	{
+		/* 
+		 * Generate IndexAgg path.
+		 */
+		Assert(!parse->groupingSets);
+		add_path(grouped_rel, (Path *)
+				 create_agg_path(root,
+								 grouped_rel,
+								 cheapest_path,
+								 grouped_rel->reltarget,
+								 AGG_INDEX,
+								 AGGSPLIT_SIMPLE,
+								 root->processed_groupClause,
+								 havingQual,
+								 agg_costs,
+								 dNumGroups));
+	}
+
 	/*
 	 * When partitionwise aggregate is used, we might have fully aggregated
 	 * paths in the partial pathlist, because add_paths_to_append_rel() will
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b6be4ddbd01..2bac26055a7 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3030,6 +3030,15 @@ create_agg_path(PlannerInfo *root,
 		else
 			pathnode->path.pathkeys = subpath->pathkeys;	/* preserves order */
 	}
+	else if (aggstrategy == AGG_INDEX)
+	{
+		/* 
+		 * When using index aggregation all grouping columns will be used as
+		 * comparator keys, so output is always sorted.
+		 */
+		pathnode->path.pathkeys = make_pathkeys_for_sortclauses(root, groupClause,
+																root->processed_tlist);
+	}
 	else
 		pathnode->path.pathkeys = NIL;	/* output is unordered */
 
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 3b9d8349078..776ccd9e2fd 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -868,6 +868,13 @@
   boot_val => 'true',
 },
 
+{ name => 'enable_indexagg', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+  short_desc => 'Enables the planner\'s use of index aggregation plans.',
+  flags => 'GUC_EXPLAIN',
+  variable => 'enable_indexagg',
+  boot_val => 'true',
+},
+
 { name => 'enable_indexonlyscan', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
   short_desc => 'Enables the planner\'s use of index-only-scan plans.',
   flags => 'GUC_EXPLAIN',
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 46a8655621d..f4b2d35b1d9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -3518,7 +3518,8 @@ typedef struct JoinPathExtraData
  */
 #define GROUPING_CAN_USE_SORT       0x0001
 #define GROUPING_CAN_USE_HASH       0x0002
-#define GROUPING_CAN_PARTIAL_AGG	0x0004
+#define GROUPING_CAN_USE_INDEX		0x0004
+#define GROUPING_CAN_PARTIAL_AGG	0x0008
 
 /*
  * What kind of partitionwise aggregation is in use?
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index b523bcda8f3..5d03b5971bd 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
 extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
+extern PGDLLIMPORT bool enable_indexagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
 extern PGDLLIMPORT bool enable_memoize;
-- 
2.43.0

0004-fix-tests-for-IndexAggregate.patchtext/x-patch; charset=UTF-8; name=0004-fix-tests-for-IndexAggregate.patchDownload

From 4cb972806327a837b42fdee614c19428b27785e4 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 17:44:14 +0300
Subject: [PATCH 4/4] fix tests for IndexAggregate

After adding IndexAggregate node some test output changed and tests
broke. This patch updates expected output.

Also it adds some IndexAggregate specific tests into aggregates.sql
---
 src/test/regress/expected/aggregates.out      | 291 +++++++++++++++++-
 src/test/regress/expected/groupingsets.out    |  38 +--
 .../regress/expected/partition_aggregate.out  | 199 +++++-------
 src/test/regress/expected/select_parallel.out |  16 +-
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/aggregates.sql           | 147 ++++++++-
 6 files changed, 524 insertions(+), 170 deletions(-)

diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index be0e1573183..2e0aead49ac 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1477,7 +1477,7 @@ explain (costs off) select * from t1 group by a,b,c,d;
 explain (costs off) select * from only t1 group by a,b,c,d;
       QUERY PLAN      
 ----------------------
- HashAggregate
+ IndexAggregate
    Group Key: a, b
    ->  Seq Scan on t1
 (3 rows)
@@ -3214,6 +3214,7 @@ FROM generate_series(1, 100) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x, y);
 ANALYZE btg;
 SET enable_hashagg = off;
+SET enable_indexagg = off;
 SET enable_seqscan = off;
 -- Utilize the ordering of index scan to avoid a Sort operation
 EXPLAIN (COSTS OFF)
@@ -3651,10 +3652,242 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
  ba       |    0 |     1
 (2 rows)
 
+ 
+--
+-- Index Aggregation tests
+--
+set enable_hashagg = false;
+set enable_sort = false;
+set enable_indexagg = true;
+set enable_indexscan = false;
+-- require ordered output
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: unique1, (sum(two))
+   ->  IndexAggregate
+         Output: unique1, sum(two)
+         Group Key: tenk1.unique1
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+ unique1 | sum 
+---------+-----
+       0 |   0
+       1 |   1
+       2 |   0
+       3 |   1
+       4 |   0
+       5 |   1
+       6 |   0
+       7 |   1
+       8 |   0
+       9 |   1
+(10 rows)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: even, (sum(two))
+   ->  IndexAggregate
+         Output: even, sum(two)
+         Group Key: tenk1.even
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+ even | sum 
+------+-----
+    1 |   0
+    3 | 100
+    5 |   0
+    7 | 100
+    9 |   0
+   11 | 100
+   13 |   0
+   15 | 100
+   17 |   0
+   19 | 100
+(10 rows)
+
+-- multiple grouping columns
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: even, odd, (sum(unique1))
+   ->  IndexAggregate
+         Output: even, odd, sum(unique1)
+         Group Key: tenk1.even, tenk1.odd
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+ even | odd |  sum   
+------+-----+--------
+    1 |   0 | 495000
+    3 |   2 | 495100
+    5 |   4 | 495200
+    7 |   6 | 495300
+    9 |   8 | 495400
+   11 |  10 | 495500
+   13 |  12 | 495600
+   15 |  14 | 495700
+   17 |  16 | 495800
+   19 |  18 | 495900
+(10 rows)
+
+-- mixing columns between group by and order by
+begin;
+create temp table tmp(x int, y int);
+insert into tmp values (1, 8), (2, 7), (3, 6), (4, 5);
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+          QUERY PLAN           
+-------------------------------
+ IndexAggregate
+   Output: x, y, sum(x)
+   Group Key: tmp.x, tmp.y
+   ->  Seq Scan on pg_temp.tmp
+         Output: x, y
+(5 rows)
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+ x | y | sum 
+---+---+-----
+ 1 | 8 |   1
+ 2 | 7 |   2
+ 3 | 6 |   3
+ 4 | 5 |   4
+(4 rows)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+          QUERY PLAN           
+-------------------------------
+ IndexAggregate
+   Output: x, y, sum(x)
+   Group Key: tmp.y, tmp.x
+   ->  Seq Scan on pg_temp.tmp
+         Output: x, y
+(5 rows)
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+ x | y | sum 
+---+---+-----
+ 4 | 5 |   4
+ 3 | 6 |   3
+ 2 | 7 |   2
+ 1 | 8 |   1
+(4 rows)
+
+--
+-- Index Aggregation Spill tests
+--
+set enable_indexagg = true;
+set enable_sort=false;
+set enable_hashagg = false;
+set work_mem='64kB';
+select unique1, count(*), sum(twothousand) from tenk1
+group by unique1
+having sum(fivethous) > 4975
+order by sum(twothousand);
+ unique1 | count | sum  
+---------+-------+------
+    4976 |     1 |  976
+    4977 |     1 |  977
+    4978 |     1 |  978
+    4979 |     1 |  979
+    4980 |     1 |  980
+    4981 |     1 |  981
+    4982 |     1 |  982
+    4983 |     1 |  983
+    4984 |     1 |  984
+    4985 |     1 |  985
+    4986 |     1 |  986
+    4987 |     1 |  987
+    4988 |     1 |  988
+    4989 |     1 |  989
+    4990 |     1 |  990
+    4991 |     1 |  991
+    4992 |     1 |  992
+    4993 |     1 |  993
+    4994 |     1 |  994
+    4995 |     1 |  995
+    4996 |     1 |  996
+    4997 |     1 |  997
+    4998 |     1 |  998
+    4999 |     1 |  999
+    9976 |     1 | 1976
+    9977 |     1 | 1977
+    9978 |     1 | 1978
+    9979 |     1 | 1979
+    9980 |     1 | 1980
+    9981 |     1 | 1981
+    9982 |     1 | 1982
+    9983 |     1 | 1983
+    9984 |     1 | 1984
+    9985 |     1 | 1985
+    9986 |     1 | 1986
+    9987 |     1 | 1987
+    9988 |     1 | 1988
+    9989 |     1 | 1989
+    9990 |     1 | 1990
+    9991 |     1 | 1991
+    9992 |     1 | 1992
+    9993 |     1 | 1993
+    9994 |     1 | 1994
+    9995 |     1 | 1995
+    9996 |     1 | 1996
+    9997 |     1 | 1997
+    9998 |     1 | 1998
+    9999 |     1 | 1999
+(48 rows)
+
+set work_mem to default;
+set enable_sort to default;
+set enable_hashagg to default;
+set enable_indexagg to default;
 --
 -- Hash Aggregation Spill tests
 --
 set enable_sort=false;
+set enable_indexagg = false;
 set work_mem='64kB';
 select unique1, count(*), sum(twothousand) from tenk1
 group by unique1
@@ -3727,6 +3960,7 @@ select g from generate_series(0, 19999) g;
 analyze agg_data_20k;
 -- Produce results with sorting.
 set enable_hashagg = false;
+set enable_indexagg = false;
 set jit_above_cost = 0;
 explain (costs off)
 select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
@@ -3796,31 +4030,74 @@ select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
   from agg_data_2k group by g/2;
 set enable_sort = true;
 set work_mem to default;
+-- Produce results with index aggregation
+set enable_sort = false;
+set enable_hashagg = false;
+set enable_indexagg = true;
+set jit_above_cost = 0;
+explain (costs off)
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+           QUERY PLAN           
+--------------------------------
+ IndexAggregate
+   Group Key: (g % 10000)
+   ->  Seq Scan on agg_data_20k
+(3 rows)
+
+create table agg_index_1 as
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+create table agg_index_2 as
+select * from
+  (values (100), (300), (500)) as r(a),
+  lateral (
+    select (g/2)::numeric as c1,
+           array_agg(g::numeric) as c2,
+	   count(*) as c3
+    from agg_data_2k
+    where g < r.a
+    group by g/2) as s;
+set jit_above_cost to default;
+create table agg_index_3 as
+select (g/2)::numeric as c1, sum(7::int4) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+create table agg_index_4 as
+select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
 -- Compare group aggregation results to hash aggregation results
 (select * from agg_hash_1 except select * from agg_group_1)
   union all
-(select * from agg_group_1 except select * from agg_hash_1);
+(select * from agg_group_1 except select * from agg_hash_1)
+  union all
+(select * from agg_index_1 except select * from agg_group_1);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
 
 (select * from agg_hash_2 except select * from agg_group_2)
   union all
-(select * from agg_group_2 except select * from agg_hash_2);
+(select * from agg_group_2 except select * from agg_hash_2)
+  union all
+(select * from agg_index_2 except select * from agg_group_2);
  a | c1 | c2 | c3 
 ---+----+----+----
 (0 rows)
 
 (select * from agg_hash_3 except select * from agg_group_3)
   union all
-(select * from agg_group_3 except select * from agg_hash_3);
+(select * from agg_group_3 except select * from agg_hash_3)
+  union all
+(select * from agg_index_3 except select * from agg_group_3);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
 
 (select * from agg_hash_4 except select * from agg_group_4)
   union all
-(select * from agg_group_4 except select * from agg_hash_4);
+(select * from agg_group_4 except select * from agg_hash_4)
+  union all
+(select * from agg_index_4 except select * from agg_group_4);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
@@ -3833,3 +4110,7 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_index_1;
+drop table agg_index_2;
+drop table agg_index_3;
+drop table agg_index_4;
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 39d35a195bc..46b80db6806 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -506,18 +506,15 @@ cross join lateral (select (select i1.q1) as x) ss
 group by ss.x;
                         QUERY PLAN                        
 ----------------------------------------------------------
- GroupAggregate
+ IndexAggregate
    Output: GROUPING((SubPlan expr_1)), ((SubPlan expr_2))
-   Group Key: ((SubPlan expr_2))
-   ->  Sort
-         Output: ((SubPlan expr_2)), i1.q1
-         Sort Key: ((SubPlan expr_2))
-         ->  Seq Scan on public.int8_tbl i1
-               Output: (SubPlan expr_2), i1.q1
-               SubPlan expr_2
-                 ->  Result
-                       Output: i1.q1
-(11 rows)
+   Group Key: (SubPlan expr_2)
+   ->  Seq Scan on public.int8_tbl i1
+         Output: (SubPlan expr_2), i1.q1
+         SubPlan expr_2
+           ->  Result
+                 Output: i1.q1
+(8 rows)
 
 select grouping(ss.x)
 from int8_tbl i1
@@ -536,21 +533,18 @@ cross join lateral (select (select i1.q1) as x) ss
 group by ss.x;
                    QUERY PLAN                   
 ------------------------------------------------
- GroupAggregate
+ IndexAggregate
    Output: (SubPlan expr_1), ((SubPlan expr_3))
-   Group Key: ((SubPlan expr_3))
-   ->  Sort
-         Output: ((SubPlan expr_3)), i1.q1
-         Sort Key: ((SubPlan expr_3))
-         ->  Seq Scan on public.int8_tbl i1
-               Output: (SubPlan expr_3), i1.q1
-               SubPlan expr_3
-                 ->  Result
-                       Output: i1.q1
+   Group Key: (SubPlan expr_3)
+   ->  Seq Scan on public.int8_tbl i1
+         Output: (SubPlan expr_3), i1.q1
+         SubPlan expr_3
+           ->  Result
+                 Output: i1.q1
    SubPlan expr_1
      ->  Result
            Output: GROUPING((SubPlan expr_2))
-(14 rows)
+(11 rows)
 
 select (select grouping(ss.x))
 from int8_tbl i1
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index c30304b99c7..956abf9dc71 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -187,25 +187,19 @@ SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 O
  Sort
    Sort Key: pagg_tab.c, (sum(pagg_tab.a)), (avg(pagg_tab.b))
    ->  Append
-         ->  GroupAggregate
+         ->  IndexAggregate
                Group Key: pagg_tab.c
                Filter: (avg(pagg_tab.d) < '15'::numeric)
-               ->  Sort
-                     Sort Key: pagg_tab.c
-                     ->  Seq Scan on pagg_tab_p1 pagg_tab
-         ->  GroupAggregate
+               ->  Seq Scan on pagg_tab_p1 pagg_tab
+         ->  IndexAggregate
                Group Key: pagg_tab_1.c
                Filter: (avg(pagg_tab_1.d) < '15'::numeric)
-               ->  Sort
-                     Sort Key: pagg_tab_1.c
-                     ->  Seq Scan on pagg_tab_p2 pagg_tab_1
-         ->  GroupAggregate
+               ->  Seq Scan on pagg_tab_p2 pagg_tab_1
+         ->  IndexAggregate
                Group Key: pagg_tab_2.c
                Filter: (avg(pagg_tab_2.d) < '15'::numeric)
-               ->  Sort
-                     Sort Key: pagg_tab_2.c
-                     ->  Seq Scan on pagg_tab_p3 pagg_tab_2
-(21 rows)
+               ->  Seq Scan on pagg_tab_p3 pagg_tab_2
+(15 rows)
 
 SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
   c   | sum  |         avg         | count 
@@ -221,31 +215,18 @@ SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 O
 -- When GROUP BY clause does not match; partial aggregation is performed for each partition.
 EXPLAIN (COSTS OFF)
 SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
-                            QUERY PLAN                            
-------------------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Sort
    Sort Key: pagg_tab.a, (sum(pagg_tab.b)), (avg(pagg_tab.b))
-   ->  Finalize GroupAggregate
+   ->  IndexAggregate
          Group Key: pagg_tab.a
          Filter: (avg(pagg_tab.d) < '15'::numeric)
-         ->  Merge Append
-               Sort Key: pagg_tab.a
-               ->  Partial GroupAggregate
-                     Group Key: pagg_tab.a
-                     ->  Sort
-                           Sort Key: pagg_tab.a
-                           ->  Seq Scan on pagg_tab_p1 pagg_tab
-               ->  Partial GroupAggregate
-                     Group Key: pagg_tab_1.a
-                     ->  Sort
-                           Sort Key: pagg_tab_1.a
-                           ->  Seq Scan on pagg_tab_p2 pagg_tab_1
-               ->  Partial GroupAggregate
-                     Group Key: pagg_tab_2.a
-                     ->  Sort
-                           Sort Key: pagg_tab_2.a
-                           ->  Seq Scan on pagg_tab_p3 pagg_tab_2
-(22 rows)
+         ->  Append
+               ->  Seq Scan on pagg_tab_p1 pagg_tab_1
+               ->  Seq Scan on pagg_tab_p2 pagg_tab_2
+               ->  Seq Scan on pagg_tab_p3 pagg_tab_3
+(9 rows)
 
 SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
  a  | sum  |         avg         | count 
@@ -267,24 +248,19 @@ EXPLAIN (COSTS OFF)
 SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
                       QUERY PLAN                      
 ------------------------------------------------------
- Merge Append
+ Sort
    Sort Key: pagg_tab.c
-   ->  Group
-         Group Key: pagg_tab.c
-         ->  Sort
-               Sort Key: pagg_tab.c
+   ->  Append
+         ->  IndexAggregate
+               Group Key: pagg_tab.c
                ->  Seq Scan on pagg_tab_p1 pagg_tab
-   ->  Group
-         Group Key: pagg_tab_1.c
-         ->  Sort
-               Sort Key: pagg_tab_1.c
+         ->  IndexAggregate
+               Group Key: pagg_tab_1.c
                ->  Seq Scan on pagg_tab_p2 pagg_tab_1
-   ->  Group
-         Group Key: pagg_tab_2.c
-         ->  Sort
-               Sort Key: pagg_tab_2.c
+         ->  IndexAggregate
+               Group Key: pagg_tab_2.c
                ->  Seq Scan on pagg_tab_p3 pagg_tab_2
-(17 rows)
+(12 rows)
 
 SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
   c   
@@ -305,31 +281,18 @@ SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
 
 EXPLAIN (COSTS OFF)
 SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
-                         QUERY PLAN                         
-------------------------------------------------------------
- Group
+                   QUERY PLAN                   
+------------------------------------------------
+ IndexAggregate
    Group Key: pagg_tab.a
-   ->  Merge Append
-         Sort Key: pagg_tab.a
-         ->  Group
-               Group Key: pagg_tab.a
-               ->  Sort
-                     Sort Key: pagg_tab.a
-                     ->  Seq Scan on pagg_tab_p1 pagg_tab
-                           Filter: (a < 3)
-         ->  Group
-               Group Key: pagg_tab_1.a
-               ->  Sort
-                     Sort Key: pagg_tab_1.a
-                     ->  Seq Scan on pagg_tab_p2 pagg_tab_1
-                           Filter: (a < 3)
-         ->  Group
-               Group Key: pagg_tab_2.a
-               ->  Sort
-                     Sort Key: pagg_tab_2.a
-                     ->  Seq Scan on pagg_tab_p3 pagg_tab_2
-                           Filter: (a < 3)
-(22 rows)
+   ->  Append
+         ->  Seq Scan on pagg_tab_p1 pagg_tab_1
+               Filter: (a < 3)
+         ->  Seq Scan on pagg_tab_p2 pagg_tab_2
+               Filter: (a < 3)
+         ->  Seq Scan on pagg_tab_p3 pagg_tab_3
+               Filter: (a < 3)
+(9 rows)
 
 SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
  a 
@@ -345,24 +308,19 @@ SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;
                          QUERY PLAN                         
 ------------------------------------------------------------
  Limit
-   ->  Merge Append
+   ->  Sort
          Sort Key: pagg_tab.c
-         ->  GroupAggregate
-               Group Key: pagg_tab.c
-               ->  Sort
-                     Sort Key: pagg_tab.c
+         ->  Append
+               ->  IndexAggregate
+                     Group Key: pagg_tab.c
                      ->  Seq Scan on pagg_tab_p1 pagg_tab
-         ->  GroupAggregate
-               Group Key: pagg_tab_1.c
-               ->  Sort
-                     Sort Key: pagg_tab_1.c
+               ->  IndexAggregate
+                     Group Key: pagg_tab_1.c
                      ->  Seq Scan on pagg_tab_p2 pagg_tab_1
-         ->  GroupAggregate
-               Group Key: pagg_tab_2.c
-               ->  Sort
-                     Sort Key: pagg_tab_2.c
+               ->  IndexAggregate
+                     Group Key: pagg_tab_2.c
                      ->  Seq Scan on pagg_tab_p3 pagg_tab_2
-(18 rows)
+(13 rows)
 
 SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;
  count 
@@ -556,43 +514,30 @@ SELECT t2.y, sum(t1.y), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2
 SET enable_hashagg TO false;
 EXPLAIN (COSTS OFF)
 SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y HAVING avg(t1.x) > 10 ORDER BY 1, 2, 3;
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                         QUERY PLAN                          
+-------------------------------------------------------------
  Sort
    Sort Key: t1.y, (sum(t1.x)), (count(*))
-   ->  Finalize GroupAggregate
+   ->  IndexAggregate
          Group Key: t1.y
          Filter: (avg(t1.x) > '10'::numeric)
-         ->  Merge Append
-               Sort Key: t1.y
-               ->  Partial GroupAggregate
-                     Group Key: t1.y
-                     ->  Sort
-                           Sort Key: t1.y
-                           ->  Hash Join
-                                 Hash Cond: (t1.x = t2.y)
-                                 ->  Seq Scan on pagg_tab1_p1 t1
-                                 ->  Hash
-                                       ->  Seq Scan on pagg_tab2_p1 t2
-               ->  Partial GroupAggregate
-                     Group Key: t1_1.y
-                     ->  Sort
-                           Sort Key: t1_1.y
-                           ->  Hash Join
-                                 Hash Cond: (t1_1.x = t2_1.y)
-                                 ->  Seq Scan on pagg_tab1_p2 t1_1
-                                 ->  Hash
-                                       ->  Seq Scan on pagg_tab2_p2 t2_1
-               ->  Partial GroupAggregate
-                     Group Key: t1_2.y
-                     ->  Sort
-                           Sort Key: t1_2.y
-                           ->  Hash Join
-                                 Hash Cond: (t2_2.y = t1_2.x)
-                                 ->  Seq Scan on pagg_tab2_p3 t2_2
-                                 ->  Hash
-                                       ->  Seq Scan on pagg_tab1_p3 t1_2
-(34 rows)
+         ->  Append
+               ->  Hash Join
+                     Hash Cond: (t1_1.x = t2_1.y)
+                     ->  Seq Scan on pagg_tab1_p1 t1_1
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab2_p1 t2_1
+               ->  Hash Join
+                     Hash Cond: (t1_2.x = t2_2.y)
+                     ->  Seq Scan on pagg_tab1_p2 t1_2
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab2_p2 t2_2
+               ->  Hash Join
+                     Hash Cond: (t2_3.y = t1_3.x)
+                     ->  Seq Scan on pagg_tab2_p3 t2_3
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab1_p3 t1_3
+(21 rows)
 
 SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y HAVING avg(t1.x) > 10 ORDER BY 1, 2, 3;
  y  | sum  | count 
@@ -839,16 +784,14 @@ SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a FULL JOI
 -- Empty join relation because of empty outer side, no partitionwise agg plan
 EXPLAIN (COSTS OFF)
 SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x, a.y ORDER BY 1, 2;
-                  QUERY PLAN                  
-----------------------------------------------
- GroupAggregate
+               QUERY PLAN               
+----------------------------------------
+ IndexAggregate
    Group Key: pagg_tab1.y
-   ->  Sort
-         Sort Key: pagg_tab1.y
-         ->  Result
-               Replaces: Join on b, pagg_tab1
-               One-Time Filter: false
-(7 rows)
+   ->  Result
+         Replaces: Join on b, pagg_tab1
+         One-Time Filter: false
+(5 rows)
 
 SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x, a.y ORDER BY 1, 2;
  x | y | count 
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 933921d1860..aee7469dc1e 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -881,20 +881,16 @@ select * from
   (select string4, count(unique2)
    from tenk1 group by string4 order by string4) ss
   right join (values (1),(2),(3)) v(x) on true;
-                        QUERY PLAN                        
-----------------------------------------------------------
+                  QUERY PLAN                  
+----------------------------------------------
  Nested Loop Left Join
    ->  Values Scan on "*VALUES*"
-   ->  Finalize GroupAggregate
+   ->  IndexAggregate
          Group Key: tenk1.string4
-         ->  Gather Merge
+         ->  Gather
                Workers Planned: 4
-               ->  Partial GroupAggregate
-                     Group Key: tenk1.string4
-                     ->  Sort
-                           Sort Key: tenk1.string4
-                           ->  Parallel Seq Scan on tenk1
-(11 rows)
+               ->  Parallel Seq Scan on tenk1
+(7 rows)
 
 select * from
   (select string4, count(unique2)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 3b37fafa65b..1c6ae8982ab 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -157,6 +157,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashagg                 | on
  enable_hashjoin                | on
  enable_incremental_sort        | on
+ enable_indexagg                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
  enable_material                | on
@@ -173,7 +174,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(25 rows)
+(26 rows)
 
 -- There are always wait event descriptions for various types.  InjectionPoint
 -- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 77ca6ffa3a9..73f5e97164c 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1374,6 +1374,7 @@ CREATE INDEX btg_x_y_idx ON btg(x, y);
 ANALYZE btg;
 
 SET enable_hashagg = off;
+SET enable_indexagg = off;
 SET enable_seqscan = off;
 
 -- Utilize the ordering of index scan to avoid a Sort operation
@@ -1605,12 +1606,100 @@ select v||'a', case v||'a' when 'aa' then 1 else 0 end, count(*)
 select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
   from unnest(array['a','b']) u(v)
  group by v||'a' order by 1;
+ 
+--
+-- Index Aggregation tests
+--
+
+set enable_hashagg = false;
+set enable_sort = false;
+set enable_indexagg = true;
+set enable_indexscan = false;
+
+-- require ordered output
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+-- multiple grouping columns
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+
+-- mixing columns between group by and order by
+begin;
+
+create temp table tmp(x int, y int);
+insert into tmp values (1, 8), (2, 7), (3, 6), (4, 5);
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+
+--
+-- Index Aggregation Spill tests
+--
+
+set enable_indexagg = true;
+set enable_sort=false;
+set enable_hashagg = false;
+set work_mem='64kB';
+
+select unique1, count(*), sum(twothousand) from tenk1
+group by unique1
+having sum(fivethous) > 4975
+order by sum(twothousand);
+
+set work_mem to default;
+set enable_sort to default;
+set enable_hashagg to default;
+set enable_indexagg to default;
 
 --
 -- Hash Aggregation Spill tests
 --
 
 set enable_sort=false;
+set enable_indexagg = false;
 set work_mem='64kB';
 
 select unique1, count(*), sum(twothousand) from tenk1
@@ -1639,6 +1728,7 @@ analyze agg_data_20k;
 -- Produce results with sorting.
 
 set enable_hashagg = false;
+set enable_indexagg = false;
 
 set jit_above_cost = 0;
 
@@ -1710,23 +1800,68 @@ select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
 set enable_sort = true;
 set work_mem to default;
 
+-- Produce results with index aggregation
+
+set enable_sort = false;
+set enable_hashagg = false;
+set enable_indexagg = true;
+
+set jit_above_cost = 0;
+
+explain (costs off)
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+
+create table agg_index_1 as
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+
+create table agg_index_2 as
+select * from
+  (values (100), (300), (500)) as r(a),
+  lateral (
+    select (g/2)::numeric as c1,
+           array_agg(g::numeric) as c2,
+	   count(*) as c3
+    from agg_data_2k
+    where g < r.a
+    group by g/2) as s;
+
+set jit_above_cost to default;
+
+create table agg_index_3 as
+select (g/2)::numeric as c1, sum(7::int4) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+
+create table agg_index_4 as
+select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+
 -- Compare group aggregation results to hash aggregation results
 
 (select * from agg_hash_1 except select * from agg_group_1)
   union all
-(select * from agg_group_1 except select * from agg_hash_1);
+(select * from agg_group_1 except select * from agg_hash_1)
+  union all
+(select * from agg_index_1 except select * from agg_group_1);
 
 (select * from agg_hash_2 except select * from agg_group_2)
   union all
-(select * from agg_group_2 except select * from agg_hash_2);
+(select * from agg_group_2 except select * from agg_hash_2)
+  union all
+(select * from agg_index_2 except select * from agg_group_2);
 
 (select * from agg_hash_3 except select * from agg_group_3)
   union all
-(select * from agg_group_3 except select * from agg_hash_3);
+(select * from agg_group_3 except select * from agg_hash_3)
+  union all
+(select * from agg_index_3 except select * from agg_group_3);
 
 (select * from agg_hash_4 except select * from agg_group_4)
   union all
-(select * from agg_group_4 except select * from agg_hash_4);
+(select * from agg_group_4 except select * from agg_hash_4)
+  union all
+(select * from agg_index_4 except select * from agg_group_4);
 
 drop table agg_group_1;
 drop table agg_group_2;
@@ -1736,3 +1871,7 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_index_1;
+drop table agg_index_2;
+drop table agg_index_3;
+drop table agg_index_4;
-- 
2.43.0

David Rowley

dgrowleyml@gmail.com

about 1 month ago

In reply to: Sergey Soloviev (#1)

Re: Introduce Index Aggregate - new GROUP BY strategy

On Tue, 9 Dec 2025 at 04:37, Sergey Soloviev
<sergey.soloviev@tantorlabs.ru> wrote:

I would like to introduce new GROUP BY strategy, called Index Aggregate.

In a nutshell, we build B+tree index where GROUP BY attributes are index
keys and if memory limit reached we will build index for each batch and
spill it to the disk as sorted run performing final external merge.
Mean IndexAgg time is about 1.8 ms and 2 ms for hash + sort, so win is about 10%.

Also, I have run TPC-H tests and 2 tests used Index Agg node (4 and 5) and this gave
near 5% gain in time.

Interesting.

Are you able to provide benchmarks with increasing numbers of groups,
say 100 to 100 million, increasing in multiples of 10, with say 1GB
work_mem, and to be fair, hash_mem_multiplier=1 with all 3 strategies.
A binary search's performance characteristics will differ vastly from
that of simplehash's hash lookup and linear probe type search. Binary
searches become much less optimal when the array becomes large as
there are many more opportunities for cache misses than with a linear
probing hash table. I think you're going to have to demonstrate that
the window where this is useful is big enough to warrant the extra
code.

Ideally, if you could show a graph and maybe name Hash Aggregate as
the baseline and show that as 1 always, then run the same benchmark
forcing a Sort -> Group Agg, and then also your Index Agg. Also,
ideally, if you could provide scripts for this so people can easily
run it themselves, to allow us to see how other hardware compares to
yours. Doing this may also help you move forward with your costing
code for the planner, but the main thing to show is that there is a
useful enough data size where this is useful.

You might want to repeat the test a few times with different data
types. Perhaps int or bigint, then also something varlena and maybe
something byref, such as UUID. Also, you might want to avoid presorted
data as I suspect it'll be hard to beat Sort -> Group Agg with
presorted data. Not causing performance regressions for presorted data
might be quite a tricky aspect of this patch.

David

Sergey Soloviev

sergey.soloviev@tantorlabs.ru

about 1 month ago

In reply to: David Rowley (#2)

4 attachment(s)

Re: Introduce Index Aggregate - new GROUP BY strategy

Hi!

Are you able to provide benchmarks

Yes, sure.

Test matrix:

- number of groups: from 100 to 1000000 increased by 10 times
- different types: int, bigint, uuid, text
- strategy: hash, group, index

For each key value there are 3 tuples with different 'j' value (for
aggregation logic).

Also, there is a test (called bigtext) for large string as a key (each string is 4kB).

To test pgbench is used. Test query looks like this:

select i, sum(j) from TBL group by 1 order by 1;

Depending on the table size duration is set from 1 to 3 minutes.
Everything in attached scripts:

- setup.sql - script to setup environment (create tables, setup GUCs).
after running this you should restart database.
NOTE: actually, for int and bigint number of groups is less
than power of 10
- run_bench.sh - shell script that runs test workload. After running
it will create files with pgbench results.
- collect_results.sh - parses output files and formats result table.
As values it shows TPS.
- show_plan.sh - small script to run EXPLAIN for each run query

Finally, I have this table:

int

| amount | HashAgg | GroupAgg | IndexAgg |
| ------------- | ------------------ | ------------------- | ------------------ |
| 100 | 3249.929602 | 3501.174072 | 3765.727121 |
| 1000 | 504.420643 | 501.465754 | 575.255906 |
| 10000 | 50.528155 | 49.312322 | 54.510261 |
| 100000 | 4.775069 | 4.317584 | 4.791735 |
| 1000000 | 0.405538 | 0.406698 | 0.321379 |

bigint

| amount | HashAgg | GroupAgg | IndexAgg |
| ------------ | -------------------| ------------------- | ------------------ |
| 100 | 3225.287886 | 3510.612641 | 3742.911726 |
| 1000 | 492.908092 | 491.530184 | 574.475159 |
| 10000 | 50.192018 | 49.555983 | 53.909437 |
| 100000 | 4.831086 | 4.430059 | 4.748821 |
| 1000000 | 0.401983 | 0.413218 | 0.318144 |

text

| amount | HashAgg | GroupAgg | IndexAgg |
| ------------ | -------------------| ------------------- | ------------------ |
| 100 | 2647.030876 | 2553.503954 | 2946.282525 |
| 1000 | 348.464373 | 286.818555 | 342.771923 |
| 10000 | 32.891834 | 24.386304 | 28.249571 |
| 100000 | 2.934513 | 1.956983 | 2.237997 |
| 1000000 | 0.249291 | 0.148780 | 0.150943 |

uuid

| amount | HashAgg | GroupAgg | IndexAgg |
| ------------ | ------------------ | ------------------- | ------------------ |
| 100 | N/A | 2282.812585 | 2432.713816 |
| 1000 | N/A | 282.637163 | 303.892131 |
| 10000 | N/A | 28.375838 | 28.924711 |
| 100000 | N/A | 2.649958 | 2.449907 |
| 1000000 | N/A | 0.255203 | 0.194414 |

bigtext

| HashAgg | GroupAgg | IndexAgg |
| -------------- | --------------- | -------------- |
| N/A | 0.035247 | 0.041120 |

NOTES: I could not make Hash + Sort plan for uuid and bigtext
test and it reproduces even on upstream without this patch.

The main observation is that on small amount of groups
Index Aggregate performs better than other strategies:

- int and bigint even up to 100K keys
- text only for 100 keys
- uuid up to 10K keys
- bigtext better than Group + Sort, but tested only on big amount
of keys (100K)

---
Sergey Soloviev

TantorLabs: https://tantorlabs.com

Сергей Соловьев

sergey.soloviev@tantorlabs.ru

about 1 month ago

In reply to: Sergey Soloviev (#3)

Re: Introduce Index Aggregate - new GROUP BY strategy

Previous message had bad table formatting. Here fixed version.

int

| amount  | HashAgg     | GroupAgg    | IndexAgg    |
| ------- | ----------- | ----------- | ----------- |
| 100     | 3249.929602 | 3501.174072 | 3765.727121 |
| 1000    | 504.420643  | 501.465754  | 575.255906  |
| 10000   | 50.528155   | 49.312322   | 54.510261   |
| 100000  | 4.775069    | 4.317584    | 4.791735    |
| 1000000 | 0.405538    | 0.406698    | 0.321379    |

bigint

| amount  | HashAgg     | GroupAgg    | IndexAgg    |
| ------- | ----------- | ----------- | ----------- |
| 100     | 3225.287886 | 3510.612641 | 3742.911726 |
| 1000    | 492.908092  | 491.530184  | 574.475159  |
| 10000   | 50.192018   | 49.555983   | 53.909437   |
| 100000  | 4.831086    | 4.430059    | 4.748821    |
| 1000000 | 0.401983    | 0.413218    | 0.318144    |

text

| amount  | HashAgg     | GroupAgg    | IndexAgg    |
| ------- | ----------- | ----------- | ----------- |
| 100     | 2647.030876 | 2553.503954 | 2946.282525 |
| 1000    | 348.464373  | 286.818555  | 342.771923  |
| 10000   | 32.891834   | 24.386304   | 28.249571   |
| 100000  | 2.934513    | 1.956983    | 2.237997    |
| 1000000 | 0.249291    | 0.148780    | 0.150943    |

uuid

| amount  | HashAgg | GroupAgg    | IndexAgg    |
| ------- | ------- | ----------- | ----------- |
| 100     | N/A     | 2282.812585 | 2432.713816 |
| 1000    | N/A     | 282.637163  | 303.892131  |
| 10000   | N/A     | 28.375838   | 28.924711   |
| 100000  | N/A     | 2.649958    | 2.449907    |
| 1000000 | N/A     | 0.255203    | 0.194414    |

bigtext

| HashAgg | GroupAgg | IndexAgg |
| ------- | -------- | -------- |
| N/A | 0.035247 | 0.041120 |

---
Sergey Soloviev

TantorLabs: https://tantorlabs.com

Sergey Soloviev

sergey.soloviev@tantorlabs.ru

about 1 month ago

In reply to: Сергей Соловьев (#4)

4 attachment(s)

Re: Introduce Index Aggregate - new GROUP BY strategy

Upstream changed and patches need to rebase. These are updated patches.

Attachments:

0001-add-in-memory-btree-tuple-index.patchtext/x-patch; charset=UTF-8; name=0001-add-in-memory-btree-tuple-index.patchDownload

From e7db0d354de3bc8f4f6b7bcc4a273b15f623ba5e Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 15:25:41 +0300
Subject: [PATCH 1/4] add in-memory btree tuple index

This patch implements in-memory B+tree structure. It will be used as
index for special type of grouping using index.

Size of each node is set using macro. For convenience equals 2^n - 1, so
for internal nodes we effectively calculate size of each page and find
split node (exactly in the middle), and for leaf nodes we can distribute
tuples for each node uniformely (according to the newly inserted tuple).

It supports different memory contexts for tracking memory allocations.
And just like in TupleHashTable during Lookup it uses 'isnew' pointer to
prevent new tuple creation (i.e. when memory limit is reached).

Also it has key abbreviation optimization support like in tuplesort. But
some code was copied and looks exactly the same way, so it is worth
separating such logic into a separate function.
---
 src/backend/executor/execGrouping.c | 643 ++++++++++++++++++++++++++++
 src/include/executor/executor.h     |  65 +++
 src/include/nodes/execnodes.h       |  86 +++-
 3 files changed, 793 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
index 8eb4c25e1cb..c83a3f2223d 100644
--- a/src/backend/executor/execGrouping.c
+++ b/src/backend/executor/execGrouping.c
@@ -622,3 +622,646 @@ TupleHashTableMatch(struct tuplehash_hash *tb, MinimalTuple tuple1, MinimalTuple
 	econtext->ecxt_outertuple = slot1;
 	return !ExecQualAndReset(hashtable->cur_eq_func, econtext);
 }
+
+/*****************************************************************************
+ * 		Utility routines for all-in-memory btree index
+ * 
+ * These routines build btree index for grouping tuples together (eg, for
+ * index aggregation).  There is one entry for each not-distinct set of tuples
+ * presented.
+ *****************************************************************************/
+
+/* 
+ * Representation of searched entry in tuple index. This have
+ * separate representation to avoid necessary memory allocations
+ * to create MinimalTuple for TupleIndexEntry.
+ */
+typedef struct TupleIndexSearchEntryData
+{
+	TupleTableSlot *slot;		/* search TupleTableSlot */
+	Datum	key1;				/* first searched key data */
+	bool	isnull1;			/* first searched key is null */
+} TupleIndexSearchEntryData;
+
+typedef TupleIndexSearchEntryData *TupleIndexSearchEntry;
+
+/* 
+ * compare_index_tuple_tiebreak
+ * 		Perform full comparison of tuples without key abbreviation.
+ * 
+ * Invoked if first key (possibly abbreviated) can not decide comparison, so
+ * we have to compare all keys.
+ */
+static inline int
+compare_index_tuple_tiebreak(TupleIndex index, TupleIndexEntry left,
+							 TupleIndexSearchEntry right)
+{
+	HeapTupleData ltup;
+	SortSupport sortKey = index->sortKeys;
+	TupleDesc tupDesc = index->tupDesc;
+	AttrNumber	attno;
+	Datum		datum1,
+				datum2;
+	bool		isnull1,
+				isnull2;
+	int			cmp;
+
+	ltup.t_len = left->tuple->t_len + MINIMAL_TUPLE_OFFSET;
+	ltup.t_data = (HeapTupleHeader) ((char *) left->tuple - MINIMAL_TUPLE_OFFSET);
+	tupDesc = index->tupDesc;
+
+	if (sortKey->abbrev_converter)
+	{
+		attno = sortKey->ssup_attno;
+
+		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);
+		datum2 = slot_getattr(right->slot, attno, &isnull2);
+
+		cmp = ApplySortAbbrevFullComparator(datum1, isnull1,
+											datum2, isnull2,
+											sortKey);
+		if (cmp != 0)
+			return cmp;
+	}
+
+	sortKey++;
+	for (int nkey = 1; nkey < index->nkeys; nkey++, sortKey++)
+	{
+		attno = sortKey->ssup_attno;
+
+		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);
+		datum2 = slot_getattr(right->slot, attno, &isnull2);
+
+		cmp = ApplySortComparator(datum1, isnull1,
+								  datum2, isnull2,
+								  sortKey);
+		if (cmp != 0)
+			return cmp;
+	}
+	
+	return 0;
+}
+
+/* 
+ * compare_index_tuple
+ * 		Compare pair of tuples during index lookup
+ * 
+ * The comparison honors key abbreviation.
+ */
+static int
+compare_index_tuple(TupleIndex index,
+					TupleIndexEntry left,
+					TupleIndexSearchEntry right)
+{
+	SortSupport sortKey = &index->sortKeys[0];
+	int	cmp = 0;
+	
+	cmp = ApplySortComparator(left->key1, left->isnull1,
+							  right->key1, right->isnull1,
+							  sortKey);
+	if (cmp != 0)
+		return cmp;
+
+	return compare_index_tuple_tiebreak(index, left, right);
+}
+
+/* 
+ * tuple_index_node_bsearch
+ * 		Perform binary search in the index node.
+ * 
+ * On return, if 'found' is set to 'true', then exact match found and returned
+ * index is an index in tuples array.  Otherwise the value handled differently:
+ * - for internal nodes this is an index in 'pointers' array which to follow
+ * - for leaf nodes this is an index to which new entry must be inserted.
+ */
+static int
+tuple_index_node_bsearch(TupleIndex index, TupleIndexNode node,
+						 TupleIndexSearchEntry search, bool *found)
+{
+	int low;
+	int high;
+	
+	low = 0;
+	high = node->ntuples;
+	*found = false;
+
+	while (low < high)
+	{
+		OffsetNumber mid = (low + high) / 2;
+		TupleIndexEntry mid_entry = node->tuples[mid];
+		int cmp;
+
+		cmp = compare_index_tuple(index, mid_entry, search);
+		if (cmp == 0)
+		{
+			*found = true;
+			return mid;
+		}
+
+		if (cmp < 0)
+			low = mid + 1;
+		else
+			high = mid;
+	}
+
+	return low;
+}
+
+static inline TupleIndexNode
+IndexLeafNodeGetNext(TupleIndexNode node)
+{
+	return node->pointers[0];
+}
+
+static inline void
+IndexLeafNodeSetNext(TupleIndexNode node, TupleIndexNode next)
+{
+	node->pointers[0] = next;
+}
+
+#define SizeofTupleIndexInternalNode \
+	  (offsetof(TupleIndexNodeData, pointers) \
+	+ (TUPLE_INDEX_NODE_MAX_ENTRIES + 1) * sizeof(TupleIndexNode))
+
+#define SizeofTupleIndexLeafNode \
+	offsetof(TupleIndexNodeData, pointers) + sizeof(TupleIndexNode)
+
+static inline TupleIndexNode
+AllocLeafIndexNode(TupleIndex index, TupleIndexNode next)
+{
+	TupleIndexNode leaf;
+	leaf = MemoryContextAllocZero(index->nodecxt, SizeofTupleIndexLeafNode);
+	IndexLeafNodeSetNext(leaf, next);
+	return leaf;
+}
+
+static inline TupleIndexNode
+AllocInternalIndexNode(TupleIndex index)
+{
+	return MemoryContextAllocZero(index->nodecxt, SizeofTupleIndexInternalNode);
+}
+
+/* 
+ * tuple_index_node_insert_at
+ * 		Insert new tuple in the node at specified index
+ * 
+ * This function is inserted when new tuple must be inserted in the node (both
+ * leaf and internal). For internal nodes 'pointer' must be also specified.
+ *
+ * Node must have free space available. It's up to caller to check if node
+ * is full and needs splitting. For split use 'tuple_index_perform_insert_split'.
+ */
+static inline void
+tuple_index_node_insert_at(TupleIndexNode node, bool is_leaf, int idx,
+						   TupleIndexEntry entry, TupleIndexNode pointer)
+{
+	int move_count;
+
+	Assert(node->ntuples < TUPLE_INDEX_NODE_MAX_ENTRIES);
+	Assert(0 <= idx && idx <= node->ntuples);
+	move_count = node->ntuples - idx;
+
+	if (move_count > 0)
+		memmove(&node->tuples[idx + 1], &node->tuples[idx],
+			move_count * sizeof(TupleIndexEntry));
+
+	node->tuples[idx] = entry;
+
+	if (!is_leaf)
+	{
+		Assert(pointer != NULL);
+
+		if (move_count > 0)
+			memmove(&node->pointers[idx + 2], &node->pointers[idx + 1],
+					move_count * sizeof(TupleIndexNode));
+		node->pointers[idx + 1] = pointer;
+	}
+
+	node->ntuples++;
+}
+
+/* 
+ * Insert tuple to full node with page split.
+ * 
+ * 'split_node_out' - new page containing nodes on right side
+ * 'split_tuple_out' - tuple, which sent to the parent node as new separator key
+ */
+static void
+tuple_index_insert_split(TupleIndex index, TupleIndexNode node, bool is_leaf,
+						 int insert_pos, TupleIndexNode *split_node_out,
+						 TupleIndexEntry *split_entry_out)
+{
+	TupleIndexNode split;
+	int split_tuple_idx;
+
+	Assert(node->ntuples == TUPLE_INDEX_NODE_MAX_ENTRIES);
+
+	if (is_leaf)
+	{
+		/* 
+		 * Max amount of tuples is kept odd, so we need to decide at
+		 * which index to perform page split. We know that split occurred
+		 * during insert, so left less entries to the page at which
+		 * insertion must occur.
+		 */
+		if (TUPLE_INDEX_NODE_MAX_ENTRIES / 2 < insert_pos)
+			split_tuple_idx = TUPLE_INDEX_NODE_MAX_ENTRIES / 2 + 1;
+		else
+			split_tuple_idx = TUPLE_INDEX_NODE_MAX_ENTRIES / 2;
+
+		split = AllocLeafIndexNode(index, IndexLeafNodeGetNext(node));
+		split->ntuples = node->ntuples - split_tuple_idx;
+		node->ntuples = split_tuple_idx;
+		memcpy(&split->tuples[0], &node->tuples[node->ntuples], 
+			   sizeof(TupleIndexEntry) * split->ntuples);
+		IndexLeafNodeSetNext(node, split);
+	}
+	else
+	{
+		/* 
+		 * After split on internal node split tuple will be removed.
+		 * Max amount of tuples is odd, so division by 2 will handle it.
+		 */
+		split_tuple_idx = TUPLE_INDEX_NODE_MAX_ENTRIES / 2;
+		split = AllocInternalIndexNode(index);
+		split->ntuples = split_tuple_idx;
+		node->ntuples = split_tuple_idx;
+		memcpy(&split->tuples[0], &node->tuples[split_tuple_idx + 1],
+				sizeof(TupleIndexEntry) * split->ntuples);
+		memcpy(&split->pointers[0], &node->pointers[split_tuple_idx + 1],
+				sizeof(TupleIndexNode) * (split->ntuples + 1));
+	}
+
+	*split_node_out = split;
+	*split_entry_out = node->tuples[split_tuple_idx];
+}
+
+static inline Datum
+mintup_getattr(MinimalTuple tup, TupleDesc tupdesc, AttrNumber attnum, bool *isnull)
+{
+	HeapTupleData htup;
+
+	htup.t_len = tup->t_len + MINIMAL_TUPLE_OFFSET;
+	htup.t_data = (HeapTupleHeader) ((char *) tup - MINIMAL_TUPLE_OFFSET);
+
+	return heap_getattr(&htup, attnum, tupdesc, isnull);
+}
+
+static TupleIndexEntry
+tuple_index_node_lookup(TupleIndex index,
+						TupleIndexNode node, int level,
+						TupleIndexSearchEntry search, bool *is_new,
+						TupleIndexNode *split_node_out,
+						TupleIndexEntry *split_entry_out)
+{
+	TupleIndexEntry entry;
+	int idx;
+	bool found;
+	bool is_leaf;
+
+	TupleIndexNode insert_pointer;
+	TupleIndexEntry insert_entry;
+	bool need_insert;
+
+	Assert(level >= 0);
+
+	idx = tuple_index_node_bsearch(index, node, search, &found);
+	if (found)
+	{
+		/* 
+		 * Both internal and leaf nodes store pointers to elements, so we can
+		 * safely return exact match found at each level.
+		 */
+		if (is_new)
+			*is_new = false;
+		return node->tuples[idx];
+	}
+
+	is_leaf = level == 0;
+	if (is_leaf)
+	{
+		MemoryContext oldcxt;
+
+		if (is_new == NULL)
+			return NULL;
+
+		oldcxt = MemoryContextSwitchTo(index->tuplecxt);
+
+		entry = palloc(sizeof(TupleIndexEntryData));
+		entry->tuple = ExecCopySlotMinimalTupleExtra(search->slot, index->additionalsize);
+
+		MemoryContextSwitchTo(oldcxt);
+
+		/* 
+		 * key1 in search tuple stored in TableTupleSlot which have it's own
+		 * lifetime, so we must not copy it.
+		 * 
+		 * But if key abbreviation is in use than we should copy it from search
+		 * tuple: this is safe (pass-by-value) and extra recalculation can
+		 * spoil statistics calculation.
+		 */
+		if (index->sortKeys->abbrev_converter)
+		{
+			entry->isnull1 = search->isnull1;
+			entry->key1 = search->key1;
+		}
+		else
+		{
+			SortSupport sortKey = &index->sortKeys[0];
+			entry->key1 = mintup_getattr(entry->tuple, index->tupDesc,
+										 sortKey->ssup_attno, &entry->isnull1);
+		}
+
+		index->ntuples++;
+
+		*is_new = true;
+		need_insert = true;
+		insert_pointer = NULL;
+		insert_entry = entry;
+	}
+	else
+	{
+		TupleIndexNode child_split_node = NULL;
+		TupleIndexEntry child_split_entry;
+
+		entry = tuple_index_node_lookup(index, node->pointers[idx], level - 1,
+										search, is_new,
+										&child_split_node, &child_split_entry);
+		if (entry == NULL)
+			return NULL;
+
+		if (child_split_node != NULL)
+		{
+			need_insert = true;
+			insert_pointer = child_split_node;
+			insert_entry = child_split_entry;
+		}
+		else
+			need_insert = false;
+	}
+	
+	if (need_insert)
+	{
+		Assert(insert_entry != NULL);
+
+		if (node->ntuples == TUPLE_INDEX_NODE_MAX_ENTRIES)
+		{
+			TupleIndexNode split_node;
+			TupleIndexEntry split_entry;
+
+			tuple_index_insert_split(index, node, is_leaf, idx,
+									 &split_node, &split_entry);
+
+			/* adjust insertion index if tuple is inserted to the splitted page */
+			if (node->ntuples < idx)
+			{
+				/* keep split tuple for leaf nodes and remove for internal */
+				if (is_leaf)
+					idx -= node->ntuples;
+				else
+					idx -= node->ntuples + 1;
+
+				node = split_node;
+			}
+
+			*split_node_out = split_node;
+			*split_entry_out = split_entry;
+		}
+
+		Assert(idx >= 0);
+		tuple_index_node_insert_at(node, is_leaf, idx, insert_entry, insert_pointer);
+	}
+
+	return entry;
+}
+
+static void
+remove_index_abbreviations(TupleIndex index)
+{
+	TupleIndexIteratorData iter;
+	TupleIndexEntry entry;
+	SortSupport sortKey = &index->sortKeys[0];
+
+	sortKey->comparator = sortKey->abbrev_full_comparator;
+	sortKey->abbrev_converter = NULL;
+	sortKey->abbrev_abort = NULL;
+	sortKey->abbrev_full_comparator = NULL;
+
+	/* now traverse all index entries and convert all existing keys */
+	InitTupleIndexIterator(index, &iter);
+	while ((entry = TupleIndexIteratorNext(&iter)) != NULL)
+		entry->key1 = mintup_getattr(entry->tuple, index->tupDesc,
+									 sortKey->ssup_attno, &entry->isnull1);
+}
+
+static inline void
+prepare_search_index_tuple(TupleIndex index, TupleTableSlot *slot,
+						   TupleIndexSearchEntry entry)
+{
+	SortSupport	sortKey;
+
+	sortKey = &index->sortKeys[0];
+
+	entry->slot = slot;
+	entry->key1 = slot_getattr(slot, sortKey->ssup_attno, &entry->isnull1);
+
+	/* NULL can not be abbreviated */
+	if (entry->isnull1)
+		return;
+
+	/* abbreviation is not used */
+	if (!sortKey->abbrev_converter)
+		return;
+
+	/* check if abbreviation should be removed */
+	if (index->abbrevNext <= index->ntuples)
+	{
+		index->abbrevNext *= 2;
+
+		if (sortKey->abbrev_abort(index->ntuples, sortKey))
+		{
+			remove_index_abbreviations(index);
+			return;
+		}
+	}
+
+	entry->key1 = sortKey->abbrev_converter(entry->key1, sortKey);
+}
+
+TupleIndexEntry
+TupleIndexLookup(TupleIndex index, TupleTableSlot *searchslot, bool *is_new)
+{
+	TupleIndexEntry entry;
+	TupleIndexSearchEntryData search_entry;
+	TupleIndexNode split_node = NULL;
+	TupleIndexEntry split_entry;
+	TupleIndexNode new_root;
+
+	prepare_search_index_tuple(index, searchslot, &search_entry);
+
+	entry = tuple_index_node_lookup(index, index->root, index->height,
+									&search_entry, is_new, &split_node, &split_entry);
+
+	if (entry == NULL)
+		return NULL;
+
+	if (split_node == NULL)
+		return entry;
+
+	/* root split */
+	new_root = AllocInternalIndexNode(index);
+	new_root->ntuples = 1;
+	new_root->tuples[0] = split_entry;
+	new_root->pointers[0] = index->root;
+	new_root->pointers[1] = split_node;
+	index->root = new_root;
+	index->height++;
+
+	return entry;
+}
+
+void
+InitTupleIndexIterator(TupleIndex index, TupleIndexIterator iter)
+{
+	TupleIndexNode min_node;
+	int level;
+
+	/* iterate to the left-most node */
+	min_node = index->root;
+	level = index->height;
+	while (level-- > 0)
+		min_node = min_node->pointers[0];
+
+	iter->cur_leaf = min_node;
+	iter->cur_idx = 0;
+}
+
+TupleIndexEntry
+TupleIndexIteratorNext(TupleIndexIterator iter)
+{
+	TupleIndexNode leaf = iter->cur_leaf;
+	TupleIndexEntry tuple;
+
+	if (leaf == NULL)
+		return NULL;
+
+	/* this also handles single empty root node case */
+	if (leaf->ntuples <= iter->cur_idx)
+	{
+		leaf = iter->cur_leaf = IndexLeafNodeGetNext(leaf);
+		if (leaf == NULL)
+			return NULL;
+		iter->cur_idx = 0;
+	}
+
+	tuple = leaf->tuples[iter->cur_idx];
+	iter->cur_idx++;
+	return tuple;
+}
+
+/* 
+ * Construct an empty TupleIndex
+ *
+ * inputDesc: tuple descriptor for input tuples
+ * nkeys: number of columns to be compared (length of next 4 arrays)
+ * attNums: attribute numbers used for grouping in sort order
+ * sortOperators: Oids of sort operator families used for comparisons
+ * sortCollations: collations used for comparisons
+ * nullsFirstFlags: strategy for handling NULL values
+ * additionalsize: size of data that may be stored along with the index entry
+ * 				   used for storing per-trans information during aggregation
+ * metacxt: memory context for TupleIndex itself
+ * tuplecxt: memory context for storing MinimalTuples
+ * nodecxt: memory context for storing index nodes
+ */
+TupleIndex
+BuildTupleIndex(TupleDesc inputDesc,
+				int nkeys,
+				AttrNumber *attNums,
+				Oid *sortOperators,
+				Oid *sortCollations,
+				bool *nullsFirstFlags,
+				Size additionalsize,
+				MemoryContext metacxt,
+				MemoryContext tuplecxt,
+				MemoryContext nodecxt)
+{
+	TupleIndex index;
+	MemoryContext oldcxt;
+
+	Assert(nkeys > 0);
+
+	additionalsize = MAXALIGN(additionalsize);
+
+	oldcxt = MemoryContextSwitchTo(metacxt);
+
+	index = (TupleIndex) palloc(sizeof(TupleIndexData));
+	index->tuplecxt = tuplecxt;
+	index->nodecxt = nodecxt;
+	index->additionalsize = additionalsize;
+	index->tupDesc = CreateTupleDescCopy(inputDesc);
+	index->root = AllocLeafIndexNode(index, NULL);
+	index->ntuples = 0;
+	index->height = 0;
+
+	index->nkeys = nkeys;
+	index->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));
+
+	for (int i = 0; i < nkeys; ++i)
+	{
+		SortSupport sortKey = &index->sortKeys[i];
+
+		Assert(AttributeNumberIsValid(attNums[i]));
+		Assert(OidIsValid(sortOperators[i]));
+
+		sortKey->ssup_cxt = CurrentMemoryContext;
+		sortKey->ssup_collation = sortCollations[i];
+		sortKey->ssup_nulls_first = nullsFirstFlags[i];
+		sortKey->ssup_attno = attNums[i];
+		/* abbreviation applies only for the first key */
+		sortKey->abbreviate = i == 0;
+
+		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
+	}
+
+	/* Update abbreviation information */
+	if (index->sortKeys[0].abbrev_converter != NULL)
+	{
+		index->abbrevUsed = true;
+		index->abbrevNext = 10;
+		index->abbrevSortOp = sortOperators[0];
+	}
+	else
+		index->abbrevUsed = false;
+
+	MemoryContextSwitchTo(oldcxt);
+	return index;
+}
+
+/* 
+ * Resets contents of the index to be empty, preserving all the non-content
+ * state.
+ */
+void
+ResetTupleIndex(TupleIndex index)
+{
+	SortSupport ssup;
+
+	/* by this time indexcxt must be reset by the caller */
+	index->root = AllocLeafIndexNode(index, NULL);
+	index->height = 0;
+	index->ntuples = 0;
+	
+	if (!index->abbrevUsed)
+		return;
+
+	/* 
+	 * If key abbreviation is used then we must reset it's state.
+	 * All fields in SortSupport are already setup, but we should clean
+	 * some fields to make it look just if we setup this for the first time.
+	 */
+	ssup = &index->sortKeys[0];
+	ssup->comparator = NULL;
+	PrepareSortSupportFromOrderingOp(index->abbrevSortOp, ssup);
+}
+
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fa2b657fb2f..6192cc8d143 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -198,6 +198,71 @@ TupleHashEntryGetAdditional(TupleHashTable hashtable, TupleHashEntry entry)
 }
 #endif
 
+extern TupleIndex BuildTupleIndex(TupleDesc inputDesc,
+								  int nkeys,
+								  AttrNumber *attNums,
+								  Oid *sortOperators,
+								  Oid *sortCollations,
+								  bool *nullsFirstFlags,
+								  Size additionalsize,
+								  MemoryContext metacxt,
+								  MemoryContext tablecxt,
+								  MemoryContext nodecxt);
+extern TupleIndexEntry TupleIndexLookup(TupleIndex index, TupleTableSlot *search,
+		  								bool *is_new);
+extern void ResetTupleIndex(TupleIndex index);
+
+/* 
+ * Start iteration over tuples in index. Supports only ascending direction.
+ * During iterations no modifications are allowed, because it can break iterator.
+ */
+extern void	InitTupleIndexIterator(TupleIndex index, TupleIndexIterator iter);
+extern TupleIndexEntry TupleIndexIteratorNext(TupleIndexIterator iter);
+static inline void
+ResetTupleIndexIterator(TupleIndex index, TupleIndexIterator iter)
+{
+	InitTupleIndexIterator(index, iter);
+}
+
+#ifndef FRONTEND
+
+/* 
+ * Return size of the index entry. Useful for estimating memory usage.
+ */
+static inline size_t
+TupleIndexEntrySize(void)
+{
+	return sizeof(TupleIndexEntryData);
+}
+
+/* 
+ * Get a pointer to the additional space allocated for this entry. The
+ * memory will be maxaligned and zeroed.
+ * 
+ * The amount of space available is the additionalsize requested in the call
+ * to BuildTupleIndex(). If additionalsize was specified as zero, return
+ * NULL.
+ */
+static inline void *
+TupleIndexEntryGetAdditional(TupleIndex index, TupleIndexEntry entry)
+{
+if (index->additionalsize > 0)
+	return (char *) (entry->tuple) - index->additionalsize;
+else
+	return NULL;
+}
+
+/* 
+ * Return tuple from index entry
+ */
+static inline MinimalTuple
+TupleIndexEntryGetMinimalTuple(TupleIndexEntry entry)
+{
+	return entry->tuple;
+}
+
+#endif
+
 /*
  * prototypes from functions in execJunk.c
  */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 64ff6996431..99ee472b51f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -900,7 +900,91 @@ typedef tuplehash_iterator TupleHashIterator;
 #define ScanTupleHashTable(htable, iter) \
 	tuplehash_iterate(htable->hashtab, iter)
 
-
+/* ---------------------------------------------------------------
+ * 				Tuple Btree index
+	*
+	* All-in-memory tuple Btree index used for grouping and aggregating.
+	* ---------------------------------------------------------------
+	*/
+
+	/* 
+	 * Representation of tuple in index.  It stores both tuple and
+	* first key information.  If key abbreviation is used, then this
+	* first key stores abbreviated key.
+	*/
+typedef struct TupleIndexEntryData
+{
+	MinimalTuple tuple;	/* actual stored tuple */
+	Datum	key1;		/* value of first key */
+	bool	isnull1;	/* first key is null */
+} TupleIndexEntryData;
+
+typedef TupleIndexEntryData *TupleIndexEntry;
+
+/* 
+ * Btree node of tuple index. Common for both internal and leaf nodes.
+	*/
+typedef struct TupleIndexNodeData
+{
+	/* amount of tuples in the node */
+	int ntuples;
+
+/* 
+ * Maximal amount of tuples stored in tuple index node.
+	*
+	* NOTE: use 2^n - 1 count, so all all tuples will fully utilize cache lines
+	*       (except first because of 'ntuples' padding)
+	*/
+#define TUPLE_INDEX_NODE_MAX_ENTRIES  63
+
+	/* 
+	 * array of tuples for this page.
+		* 
+		* for internal node these are separator keys.
+		* for leaf nodes actual tuples.
+		*/
+	TupleIndexEntry tuples[TUPLE_INDEX_NODE_MAX_ENTRIES];
+
+	/* 
+	 * for internal nodes this is an array with size
+		* TUPLE_INDEX_NODE_MAX_ENTRIES + 1 - pointers to nodes below.
+		* 
+		* for leaf nodes this is an array of 1 element - pointer to sibling
+		* node required for iteration
+		*/
+	struct TupleIndexNodeData *pointers[FLEXIBLE_ARRAY_MEMBER];
+} TupleIndexNodeData;
+
+typedef TupleIndexNodeData *TupleIndexNode;
+
+typedef struct TupleIndexData
+{
+	TupleDesc	tupDesc;		/* descriptor for stored tuples */
+	TupleIndexNode root;		/* root of the tree */
+	int		height;				/* current tree height */
+	int		ntuples;			/* number of tuples in index */
+	int		nkeys;				/* amount of keys in tuple */
+	SortSupport	sortKeys;		/* support functions for key comparison */
+	MemoryContext	tuplecxt;	/* memory context containing tuples */
+	MemoryContext	nodecxt;	/* memory context containing index nodes */
+	Size	additionalsize;		/* size of additional data for tuple */
+	int		abbrevNext;			/* next time we should check abbreviation 
+									* optimization efficiency */
+	bool	abbrevUsed;			/* true if key abbreviation optimization
+									* was ever used */
+	Oid		abbrevSortOp;		/* sort operator for first key */
+} TupleIndexData;
+
+typedef struct TupleIndexData *TupleIndex;
+
+typedef struct TupleIndexIteratorData
+{
+	TupleIndexNode	cur_leaf;	/* current leaf node */
+	OffsetNumber	cur_idx;	/* index of tuple to return next */
+} TupleIndexIteratorData;
+
+typedef TupleIndexIteratorData *TupleIndexIterator;
+	
 /* ----------------------------------------------------------------
  *				 Expression State Nodes
  *
-- 
2.43.0

0003-make-use-of-IndexAggregate-in-planner-and-explain.patchtext/x-patch; charset=UTF-8; name=0003-make-use-of-IndexAggregate-in-planner-and-explain.patchDownload

From 7e216b0c9554203899da48247d24d66309a1666f Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 17:34:18 +0300
Subject: [PATCH 3/4] make use of IndexAggregate in planner and explain

This commit adds usage of IndexAggregate in planner and explain (analyze).

We calculate cost of IndexAggregate and add AGG_INDEX node to the pathlist.
Cost of this node is cost of building B+tree (in memory), disk spill and
final external merge.

For EXPLAIN there is only little change - show sort information in "Group Key".
---
 src/backend/commands/explain.c            | 101 ++++++++++++++++++----
 src/backend/optimizer/path/costsize.c     |  90 +++++++++++++++----
 src/backend/optimizer/plan/createplan.c   |  15 +++-
 src/backend/optimizer/plan/planner.c      |  35 ++++++++
 src/backend/optimizer/util/pathnode.c     |   9 ++
 src/backend/utils/misc/guc_parameters.dat |   7 ++
 src/include/nodes/pathnodes.h             |   3 +-
 src/include/optimizer/cost.h              |   1 +
 8 files changed, 222 insertions(+), 39 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5a6390631eb..f9127761196 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -134,7 +134,7 @@ static void show_recursive_union_info(RecursiveUnionState *rstate,
 									  ExplainState *es);
 static void show_memoize_info(MemoizeState *mstate, List *ancestors,
 							  ExplainState *es);
-static void show_hashagg_info(AggState *aggstate, ExplainState *es);
+static void show_agg_spill_info(AggState *aggstate, ExplainState *es);
 static void show_indexsearches_info(PlanState *planstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1556,6 +1556,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 						pname = "MixedAggregate";
 						strategy = "Mixed";
 						break;
+					case AGG_INDEX:
+						pname = "IndexAggregate";
+						strategy = "Indexed";
+						break;
 					default:
 						pname = "Aggregate ???";
 						strategy = "???";
@@ -2200,7 +2204,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Agg:
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
-			show_hashagg_info((AggState *) planstate, es);
+			show_agg_spill_info((AggState *) planstate, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
@@ -2631,6 +2635,24 @@ show_agg_keys(AggState *astate, List *ancestors,
 
 		if (plan->groupingSets)
 			show_grouping_sets(outerPlanState(astate), plan, ancestors, es);
+		else if (plan->aggstrategy == AGG_INDEX)
+			{
+				Sort	*sort = astate->index_sort;
+
+				/* 
+				 * Index Agg reorders GROUP BY keys to match ORDER BY
+				 * so they must be the same, but we should show other
+				 * useful information about used ordering, such as direction.
+				 */
+				Assert(sort != NULL);
+				show_sort_group_keys(outerPlanState(astate), "Group Key",
+									 plan->numCols, 0,
+									 sort->sortColIdx,
+									 sort->sortOperators,
+									 sort->collations,
+									 sort->nullsFirst,
+									 ancestors, es);
+			}
 		else
 			show_sort_group_keys(outerPlanState(astate), "Group Key",
 								 plan->numCols, 0, plan->grpColIdx,
@@ -3735,47 +3757,67 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es)
 }
 
 /*
- * Show information on hash aggregate memory usage and batches.
+ * Show information on hash or index aggregate memory usage and batches.
  */
 static void
-show_hashagg_info(AggState *aggstate, ExplainState *es)
+show_agg_spill_info(AggState *aggstate, ExplainState *es)
 {
 	Agg		   *agg = (Agg *) aggstate->ss.ps.plan;
-	int64		memPeakKb = BYTES_TO_KILOBYTES(aggstate->hash_mem_peak);
+	int64		memPeakKb = BYTES_TO_KILOBYTES(aggstate->spill_mem_peak);
 
 	if (agg->aggstrategy != AGG_HASHED &&
-		agg->aggstrategy != AGG_MIXED)
+		agg->aggstrategy != AGG_MIXED &&
+		agg->aggstrategy != AGG_INDEX)
 		return;
 
 	if (es->format != EXPLAIN_FORMAT_TEXT)
 	{
 		if (es->costs)
 			ExplainPropertyInteger("Planned Partitions", NULL,
-								   aggstate->hash_planned_partitions, es);
+								   aggstate->spill_planned_partitions, es);
 
 		/*
 		 * During parallel query the leader may have not helped out.  We
 		 * detect this by checking how much memory it used.  If we find it
 		 * didn't do any work then we don't show its properties.
 		 */
-		if (es->analyze && aggstate->hash_mem_peak > 0)
+		if (es->analyze && aggstate->spill_mem_peak > 0)
 		{
 			ExplainPropertyInteger("HashAgg Batches", NULL,
-								   aggstate->hash_batches_used, es);
+								   aggstate->spill_batches_used, es);
 			ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
 			ExplainPropertyInteger("Disk Usage", "kB",
-								   aggstate->hash_disk_used, es);
+								   aggstate->spill_disk_used, es);
+		}
+
+		if (   es->analyze
+			&& aggstate->aggstrategy == AGG_INDEX
+			&& aggstate->mergestate != NULL)
+		{
+			TuplesortInstrumentation stats;
+			const char *mergeMethod;
+			const char *spaceType;
+			int64 spaceUsed;
+			
+			tuplesort_get_stats(aggstate->mergestate, &stats);
+			mergeMethod = tuplesort_method_name(stats.sortMethod);
+			spaceType = tuplesort_space_type_name(stats.spaceType);
+			spaceUsed = stats.spaceUsed;
+
+			ExplainPropertyText("Merge Method", mergeMethod, es);
+			ExplainPropertyInteger("Merge Space Used", "kB", stats.spaceUsed, es);
+			ExplainPropertyText("Merge Space Type", spaceType, es);
 		}
 	}
 	else
 	{
 		bool		gotone = false;
 
-		if (es->costs && aggstate->hash_planned_partitions > 0)
+		if (es->costs && aggstate->spill_planned_partitions > 0)
 		{
 			ExplainIndentText(es);
 			appendStringInfo(es->str, "Planned Partitions: %d",
-							 aggstate->hash_planned_partitions);
+							 aggstate->spill_planned_partitions);
 			gotone = true;
 		}
 
@@ -3784,7 +3826,7 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 		 * detect this by checking how much memory it used.  If we find it
 		 * didn't do any work then we don't show its properties.
 		 */
-		if (es->analyze && aggstate->hash_mem_peak > 0)
+		if (es->analyze && aggstate->spill_mem_peak > 0)
 		{
 			if (!gotone)
 				ExplainIndentText(es);
@@ -3792,17 +3834,44 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 				appendStringInfoSpaces(es->str, 2);
 
 			appendStringInfo(es->str, "Batches: %d  Memory Usage: " INT64_FORMAT "kB",
-							 aggstate->hash_batches_used, memPeakKb);
+							 aggstate->spill_batches_used, memPeakKb);
 			gotone = true;
 
 			/* Only display disk usage if we spilled to disk */
-			if (aggstate->hash_batches_used > 1)
+			if (aggstate->spill_batches_used > 1)
 			{
 				appendStringInfo(es->str, "  Disk Usage: " UINT64_FORMAT "kB",
-								 aggstate->hash_disk_used);
+								 aggstate->spill_disk_used);
 			}
 		}
 
+		/* For index aggregate show stats for final merging */
+		if (   es->analyze
+			&& aggstate->aggstrategy == AGG_INDEX
+			&& aggstate->mergestate != NULL)
+		{
+			TuplesortInstrumentation stats;
+			const char *mergeMethod;
+			const char *spaceType;
+			int64 spaceUsed;
+			
+			tuplesort_get_stats(aggstate->mergestate, &stats);
+			mergeMethod = tuplesort_method_name(stats.sortMethod);
+			spaceType = tuplesort_space_type_name(stats.spaceType);
+			spaceUsed = stats.spaceUsed;
+
+			/* 
+			 * If we are here that means that previous check (for mem peak) was
+			 * successfull (can not directly go to merge without any in-memory
+			 * operations).  Do not check other state and just start a new line.
+			 */
+			appendStringInfoChar(es->str, '\n');
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Merge Method: %s  %s: " INT64_FORMAT "kB",
+							 mergeMethod, spaceType, spaceUsed);
+			gotone = true;
+		}
+
 		if (gotone)
 			appendStringInfoChar(es->str, '\n');
 	}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a39cc793b4d..ea1d18521b8 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -150,6 +150,7 @@ bool		enable_tidscan = true;
 bool		enable_sort = true;
 bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
+bool		enable_indexagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
 bool		enable_memoize = true;
@@ -1848,6 +1849,32 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
 									rterm->pathtarget->width);
 }
 
+/* 
+ * cost_tuplemerge
+ *		Determines and returns the cost of external merge used in tuplesort.
+ */
+static void
+cost_tuplemerge(double availMem, double input_bytes, double ntuples,
+				Cost comparison_cost, Cost *cost)
+{
+	double		npages = ceil(input_bytes / BLCKSZ);
+	double		nruns = input_bytes / availMem;
+	double		mergeorder = tuplesort_merge_order(availMem);
+	double		log_runs;
+	double		npageaccesses;
+
+	/* Compute logM(r) as log(r) / log(M) */
+	if (nruns > mergeorder)
+		log_runs = ceil(log(nruns) / log(mergeorder));
+	else
+		log_runs = 1.0;
+
+	npageaccesses = 2.0 * npages * log_runs;
+
+	/* Assume 3/4ths of accesses are sequential, 1/4th are not */
+	*cost += npageaccesses * (seq_page_cost * 0.75 + random_page_cost * 0.25);
+}
+
 /*
  * cost_tuplesort
  *	  Determines and returns the cost of sorting a relation using tuplesort,
@@ -1922,11 +1949,6 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		/*
 		 * We'll have to use a disk-based sort of all the tuples
 		 */
-		double		npages = ceil(input_bytes / BLCKSZ);
-		double		nruns = input_bytes / sort_mem_bytes;
-		double		mergeorder = tuplesort_merge_order(sort_mem_bytes);
-		double		log_runs;
-		double		npageaccesses;
 
 		/*
 		 * CPU costs
@@ -1936,16 +1958,8 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		*startup_cost = comparison_cost * tuples * LOG2(tuples);
 
 		/* Disk costs */
-
-		/* Compute logM(r) as log(r) / log(M) */
-		if (nruns > mergeorder)
-			log_runs = ceil(log(nruns) / log(mergeorder));
-		else
-			log_runs = 1.0;
-		npageaccesses = 2.0 * npages * log_runs;
-		/* Assume 3/4ths of accesses are sequential, 1/4th are not */
-		*startup_cost += npageaccesses *
-			(seq_page_cost * 0.75 + random_page_cost * 0.25);
+		cost_tuplemerge(sort_mem_bytes, input_bytes, tuples, comparison_cost,
+						startup_cost);
 	}
 	else if (tuples > 2 * output_tuples || input_bytes > sort_mem_bytes)
 	{
@@ -2770,7 +2784,7 @@ cost_agg(Path *path, PlannerInfo *root,
 		total_cost += cpu_tuple_cost * numGroups;
 		output_tuples = numGroups;
 	}
-	else
+	else if (aggstrategy == AGG_HASHED)
 	{
 		/* must be AGG_HASHED */
 		startup_cost = input_total_cost;
@@ -2788,6 +2802,27 @@ cost_agg(Path *path, PlannerInfo *root,
 		total_cost += cpu_tuple_cost * numGroups;
 		output_tuples = numGroups;
 	}
+	else
+	{
+		/* must be AGG_INDEX */
+		startup_cost = input_total_cost;
+		if (!enable_indexagg)
+			++disabled_nodes;
+
+		startup_cost += aggcosts->transCost.startup;
+		startup_cost += aggcosts->transCost.per_tuple * input_tuples;
+		/* cost of btree building */
+		startup_cost +=   (2.0 * cpu_operator_cost * numGroupCols) /* comparison cost */
+						* LOG2(numGroups)	/* tree height/number of comparisons */
+						* input_tuples;		/* amount of tuples */
+		startup_cost += aggcosts->finalCost.startup;
+
+		total_cost = startup_cost;
+		total_cost += aggcosts->finalCost.per_tuple * numGroups;
+		/* cost of retrieving from hash table */
+		total_cost += cpu_tuple_cost * numGroups;
+		output_tuples = numGroups;
+	}
 
 	/*
 	 * Add the disk costs of hash aggregation that spills to disk.
@@ -2802,7 +2837,7 @@ cost_agg(Path *path, PlannerInfo *root,
 	 * Accrue writes (spilled tuples) to startup_cost and to total_cost;
 	 * accrue reads only to total_cost.
 	 */
-	if (aggstrategy == AGG_HASHED || aggstrategy == AGG_MIXED)
+	if (aggstrategy == AGG_HASHED || aggstrategy == AGG_MIXED || aggstrategy == AGG_INDEX)
 	{
 		double		pages;
 		double		pages_written = 0.0;
@@ -2823,8 +2858,8 @@ cost_agg(Path *path, PlannerInfo *root,
 		hashentrysize = hash_agg_entry_size(list_length(root->aggtransinfos),
 											input_width,
 											aggcosts->transitionSpace);
-		hash_agg_set_limits(hashentrysize, numGroups, 0, &mem_limit,
-							&ngroups_limit, &num_partitions);
+		agg_set_limits(hashentrysize, numGroups, 0, &mem_limit,
+					   &ngroups_limit, &num_partitions);
 
 		nbatches = Max((numGroups * hashentrysize) / mem_limit,
 					   numGroups / ngroups_limit);
@@ -2861,6 +2896,23 @@ cost_agg(Path *path, PlannerInfo *root,
 		spill_cost = depth * input_tuples * 2.0 * cpu_tuple_cost;
 		startup_cost += spill_cost;
 		total_cost += spill_cost;
+
+		/* 
+		 * Index agg also writes sorted runs on tape for futher merging.
+		 */
+		if (aggstrategy == AGG_INDEX)
+		{
+			double	output_bytes;
+			Cost	comparison_cost;
+			
+			/* size of all projected tuples */
+			output_bytes = path->pathtarget->width * output_tuples;
+			/* default comparison cost */
+			comparison_cost = 2.0 * cpu_operator_cost;
+
+			cost_tuplemerge(work_mem, output_bytes, output_tuples,
+							comparison_cost, &startup_cost);
+		}
 	}
 
 	/*
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bc417f93840..de9bb1ef30b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2158,6 +2158,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
 	Plan	   *subplan;
 	List	   *tlist;
 	List	   *quals;
+	List	   *chain;
+	AttrNumber *grpColIdx;
 
 	/*
 	 * Agg can project, so no need to be terribly picky about child tlist, but
@@ -2169,17 +2171,24 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
 
 	quals = order_qual_clauses(root, best_path->qual);
 
+	grpColIdx = extract_grouping_cols(best_path->groupClause, subplan->targetlist);
+
+	/* For index aggregation we should consider the desired sorting order. */
+	if (best_path->aggstrategy == AGG_INDEX)
+		chain = list_make1(make_sort_from_groupcols(best_path->groupClause, grpColIdx, subplan));
+	else
+		chain = NIL;
+
 	plan = make_agg(tlist, quals,
 					best_path->aggstrategy,
 					best_path->aggsplit,
 					list_length(best_path->groupClause),
-					extract_grouping_cols(best_path->groupClause,
-										  subplan->targetlist),
+					grpColIdx,
 					extract_grouping_ops(best_path->groupClause),
 					extract_grouping_collations(best_path->groupClause,
 												subplan->targetlist),
 					NIL,
-					NIL,
+					chain,
 					best_path->numGroups,
 					best_path->transitionSpace,
 					subplan);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8b22c30559b..cfd2f3ff3a9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3877,6 +3877,21 @@ create_grouping_paths(PlannerInfo *root,
 			 (gd ? gd->any_hashable : grouping_is_hashable(root->processed_groupClause))))
 			flags |= GROUPING_CAN_USE_HASH;
 
+		/* 
+		 * Determine whether we should consider index-based implementation of
+		 * grouping.
+		 * 
+		 * This is more restrictive since it not only must be sortable (for
+		 * purposes of Btree), but also must be hashable, so we can effectively
+		 * spill tuples and later process each batch.
+		 */
+		if (   gd == NULL
+			&& root->numOrderedAggs == 0
+			&& parse->groupClause != NIL
+			&& grouping_is_sortable(root->processed_groupClause)
+			&& grouping_is_hashable(root->processed_groupClause))
+			flags |= GROUPING_CAN_USE_INDEX;
+
 		/*
 		 * Determine whether partial aggregation is possible.
 		 */
@@ -7108,6 +7123,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+	bool		can_index = (extra->flags & GROUPING_CAN_USE_INDEX) != 0;
 	List	   *havingQual = (List *) extra->havingQual;
 	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 	double		dNumGroups = 0;
@@ -7329,6 +7345,25 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		}
 	}
 
+	if (can_index)
+	{
+		/* 
+		 * Generate IndexAgg path.
+		 */
+		Assert(!parse->groupingSets);
+		add_path(grouped_rel, (Path *)
+				 create_agg_path(root,
+								 grouped_rel,
+								 cheapest_path,
+								 grouped_rel->reltarget,
+								 AGG_INDEX,
+								 AGGSPLIT_SIMPLE,
+								 root->processed_groupClause,
+								 havingQual,
+								 agg_costs,
+								 dNumGroups));
+	}
+
 	/*
 	 * When partitionwise aggregate is used, we might have fully aggregated
 	 * paths in the partial pathlist, because add_paths_to_append_rel() will
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b6be4ddbd01..2bac26055a7 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3030,6 +3030,15 @@ create_agg_path(PlannerInfo *root,
 		else
 			pathnode->path.pathkeys = subpath->pathkeys;	/* preserves order */
 	}
+	else if (aggstrategy == AGG_INDEX)
+	{
+		/* 
+		 * When using index aggregation all grouping columns will be used as
+		 * comparator keys, so output is always sorted.
+		 */
+		pathnode->path.pathkeys = make_pathkeys_for_sortclauses(root, groupClause,
+																root->processed_tlist);
+	}
 	else
 		pathnode->path.pathkeys = NIL;	/* output is unordered */
 
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 3b9d8349078..776ccd9e2fd 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -868,6 +868,13 @@
   boot_val => 'true',
 },
 
+{ name => 'enable_indexagg', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+  short_desc => 'Enables the planner\'s use of index aggregation plans.',
+  flags => 'GUC_EXPLAIN',
+  variable => 'enable_indexagg',
+  boot_val => 'true',
+},
+
 { name => 'enable_indexonlyscan', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
   short_desc => 'Enables the planner\'s use of index-only-scan plans.',
   flags => 'GUC_EXPLAIN',
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 46a8655621d..f4b2d35b1d9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -3518,7 +3518,8 @@ typedef struct JoinPathExtraData
  */
 #define GROUPING_CAN_USE_SORT       0x0001
 #define GROUPING_CAN_USE_HASH       0x0002
-#define GROUPING_CAN_PARTIAL_AGG	0x0004
+#define GROUPING_CAN_USE_INDEX		0x0004
+#define GROUPING_CAN_PARTIAL_AGG	0x0008
 
 /*
  * What kind of partitionwise aggregation is in use?
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index b523bcda8f3..5d03b5971bd 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
 extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
+extern PGDLLIMPORT bool enable_indexagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
 extern PGDLLIMPORT bool enable_memoize;
-- 
2.43.0

0004-fix-tests-for-IndexAggregate.patchtext/x-patch; charset=UTF-8; name=0004-fix-tests-for-IndexAggregate.patchDownload

From a2837f395e699feedce640bc942e11dd51f4e728 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 17:44:14 +0300
Subject: [PATCH 4/4] fix tests for IndexAggregate

After adding IndexAggregate node some test output changed and tests
broke. This patch updates expected output.

Also it adds some IndexAggregate specific tests into aggregates.sql
---
 src/test/regress/expected/aggregates.out      | 291 +++++++++++++++++-
 src/test/regress/expected/groupingsets.out    |  38 +--
 .../regress/expected/partition_aggregate.out  | 199 +++++-------
 src/test/regress/expected/select_parallel.out |  16 +-
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/aggregates.sql           | 147 ++++++++-
 6 files changed, 524 insertions(+), 170 deletions(-)

diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index cae8e7bca31..afe01f5da85 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1533,7 +1533,7 @@ explain (costs off) select * from t1 group by a,b,c,d;
 explain (costs off) select * from only t1 group by a,b,c,d;
       QUERY PLAN      
 ----------------------
- HashAggregate
+ IndexAggregate
    Group Key: a, b
    ->  Seq Scan on t1
 (3 rows)
@@ -3270,6 +3270,7 @@ FROM generate_series(1, 100) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x, y);
 ANALYZE btg;
 SET enable_hashagg = off;
+SET enable_indexagg = off;
 SET enable_seqscan = off;
 -- Utilize the ordering of index scan to avoid a Sort operation
 EXPLAIN (COSTS OFF)
@@ -3707,10 +3708,242 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
  ba       |    0 |     1
 (2 rows)
 
+ 
+--
+-- Index Aggregation tests
+--
+set enable_hashagg = false;
+set enable_sort = false;
+set enable_indexagg = true;
+set enable_indexscan = false;
+-- require ordered output
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: unique1, (sum(two))
+   ->  IndexAggregate
+         Output: unique1, sum(two)
+         Group Key: tenk1.unique1
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+ unique1 | sum 
+---------+-----
+       0 |   0
+       1 |   1
+       2 |   0
+       3 |   1
+       4 |   0
+       5 |   1
+       6 |   0
+       7 |   1
+       8 |   0
+       9 |   1
+(10 rows)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: even, (sum(two))
+   ->  IndexAggregate
+         Output: even, sum(two)
+         Group Key: tenk1.even
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+ even | sum 
+------+-----
+    1 |   0
+    3 | 100
+    5 |   0
+    7 | 100
+    9 |   0
+   11 | 100
+   13 |   0
+   15 | 100
+   17 |   0
+   19 | 100
+(10 rows)
+
+-- multiple grouping columns
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: even, odd, (sum(unique1))
+   ->  IndexAggregate
+         Output: even, odd, sum(unique1)
+         Group Key: tenk1.even, tenk1.odd
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+ even | odd |  sum   
+------+-----+--------
+    1 |   0 | 495000
+    3 |   2 | 495100
+    5 |   4 | 495200
+    7 |   6 | 495300
+    9 |   8 | 495400
+   11 |  10 | 495500
+   13 |  12 | 495600
+   15 |  14 | 495700
+   17 |  16 | 495800
+   19 |  18 | 495900
+(10 rows)
+
+-- mixing columns between group by and order by
+begin;
+create temp table tmp(x int, y int);
+insert into tmp values (1, 8), (2, 7), (3, 6), (4, 5);
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+          QUERY PLAN           
+-------------------------------
+ IndexAggregate
+   Output: x, y, sum(x)
+   Group Key: tmp.x, tmp.y
+   ->  Seq Scan on pg_temp.tmp
+         Output: x, y
+(5 rows)
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+ x | y | sum 
+---+---+-----
+ 1 | 8 |   1
+ 2 | 7 |   2
+ 3 | 6 |   3
+ 4 | 5 |   4
+(4 rows)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+          QUERY PLAN           
+-------------------------------
+ IndexAggregate
+   Output: x, y, sum(x)
+   Group Key: tmp.y, tmp.x
+   ->  Seq Scan on pg_temp.tmp
+         Output: x, y
+(5 rows)
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+ x | y | sum 
+---+---+-----
+ 4 | 5 |   4
+ 3 | 6 |   3
+ 2 | 7 |   2
+ 1 | 8 |   1
+(4 rows)
+
+--
+-- Index Aggregation Spill tests
+--
+set enable_indexagg = true;
+set enable_sort=false;
+set enable_hashagg = false;
+set work_mem='64kB';
+select unique1, count(*), sum(twothousand) from tenk1
+group by unique1
+having sum(fivethous) > 4975
+order by sum(twothousand);
+ unique1 | count | sum  
+---------+-------+------
+    4976 |     1 |  976
+    4977 |     1 |  977
+    4978 |     1 |  978
+    4979 |     1 |  979
+    4980 |     1 |  980
+    4981 |     1 |  981
+    4982 |     1 |  982
+    4983 |     1 |  983
+    4984 |     1 |  984
+    4985 |     1 |  985
+    4986 |     1 |  986
+    4987 |     1 |  987
+    4988 |     1 |  988
+    4989 |     1 |  989
+    4990 |     1 |  990
+    4991 |     1 |  991
+    4992 |     1 |  992
+    4993 |     1 |  993
+    4994 |     1 |  994
+    4995 |     1 |  995
+    4996 |     1 |  996
+    4997 |     1 |  997
+    4998 |     1 |  998
+    4999 |     1 |  999
+    9976 |     1 | 1976
+    9977 |     1 | 1977
+    9978 |     1 | 1978
+    9979 |     1 | 1979
+    9980 |     1 | 1980
+    9981 |     1 | 1981
+    9982 |     1 | 1982
+    9983 |     1 | 1983
+    9984 |     1 | 1984
+    9985 |     1 | 1985
+    9986 |     1 | 1986
+    9987 |     1 | 1987
+    9988 |     1 | 1988
+    9989 |     1 | 1989
+    9990 |     1 | 1990
+    9991 |     1 | 1991
+    9992 |     1 | 1992
+    9993 |     1 | 1993
+    9994 |     1 | 1994
+    9995 |     1 | 1995
+    9996 |     1 | 1996
+    9997 |     1 | 1997
+    9998 |     1 | 1998
+    9999 |     1 | 1999
+(48 rows)
+
+set work_mem to default;
+set enable_sort to default;
+set enable_hashagg to default;
+set enable_indexagg to default;
 --
 -- Hash Aggregation Spill tests
 --
 set enable_sort=false;
+set enable_indexagg = false;
 set work_mem='64kB';
 select unique1, count(*), sum(twothousand) from tenk1
 group by unique1
@@ -3783,6 +4016,7 @@ select g from generate_series(0, 19999) g;
 analyze agg_data_20k;
 -- Produce results with sorting.
 set enable_hashagg = false;
+set enable_indexagg = false;
 set jit_above_cost = 0;
 explain (costs off)
 select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
@@ -3852,31 +4086,74 @@ select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
   from agg_data_2k group by g/2;
 set enable_sort = true;
 set work_mem to default;
+-- Produce results with index aggregation
+set enable_sort = false;
+set enable_hashagg = false;
+set enable_indexagg = true;
+set jit_above_cost = 0;
+explain (costs off)
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+           QUERY PLAN           
+--------------------------------
+ IndexAggregate
+   Group Key: (g % 10000)
+   ->  Seq Scan on agg_data_20k
+(3 rows)
+
+create table agg_index_1 as
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+create table agg_index_2 as
+select * from
+  (values (100), (300), (500)) as r(a),
+  lateral (
+    select (g/2)::numeric as c1,
+           array_agg(g::numeric) as c2,
+	   count(*) as c3
+    from agg_data_2k
+    where g < r.a
+    group by g/2) as s;
+set jit_above_cost to default;
+create table agg_index_3 as
+select (g/2)::numeric as c1, sum(7::int4) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+create table agg_index_4 as
+select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
 -- Compare group aggregation results to hash aggregation results
 (select * from agg_hash_1 except select * from agg_group_1)
   union all
-(select * from agg_group_1 except select * from agg_hash_1);
+(select * from agg_group_1 except select * from agg_hash_1)
+  union all
+(select * from agg_index_1 except select * from agg_group_1);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
 
 (select * from agg_hash_2 except select * from agg_group_2)
   union all
-(select * from agg_group_2 except select * from agg_hash_2);
+(select * from agg_group_2 except select * from agg_hash_2)
+  union all
+(select * from agg_index_2 except select * from agg_group_2);
  a | c1 | c2 | c3 
 ---+----+----+----
 (0 rows)
 
 (select * from agg_hash_3 except select * from agg_group_3)
   union all
-(select * from agg_group_3 except select * from agg_hash_3);
+(select * from agg_group_3 except select * from agg_hash_3)
+  union all
+(select * from agg_index_3 except select * from agg_group_3);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
 
 (select * from agg_hash_4 except select * from agg_group_4)
   union all
-(select * from agg_group_4 except select * from agg_hash_4);
+(select * from agg_group_4 except select * from agg_hash_4)
+  union all
+(select * from agg_index_4 except select * from agg_group_4);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
@@ -3889,3 +4166,7 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_index_1;
+drop table agg_index_2;
+drop table agg_index_3;
+drop table agg_index_4;
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 39d35a195bc..46b80db6806 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -506,18 +506,15 @@ cross join lateral (select (select i1.q1) as x) ss
 group by ss.x;
                         QUERY PLAN                        
 ----------------------------------------------------------
- GroupAggregate
+ IndexAggregate
    Output: GROUPING((SubPlan expr_1)), ((SubPlan expr_2))
-   Group Key: ((SubPlan expr_2))
-   ->  Sort
-         Output: ((SubPlan expr_2)), i1.q1
-         Sort Key: ((SubPlan expr_2))
-         ->  Seq Scan on public.int8_tbl i1
-               Output: (SubPlan expr_2), i1.q1
-               SubPlan expr_2
-                 ->  Result
-                       Output: i1.q1
-(11 rows)
+   Group Key: (SubPlan expr_2)
+   ->  Seq Scan on public.int8_tbl i1
+         Output: (SubPlan expr_2), i1.q1
+         SubPlan expr_2
+           ->  Result
+                 Output: i1.q1
+(8 rows)
 
 select grouping(ss.x)
 from int8_tbl i1
@@ -536,21 +533,18 @@ cross join lateral (select (select i1.q1) as x) ss
 group by ss.x;
                    QUERY PLAN                   
 ------------------------------------------------
- GroupAggregate
+ IndexAggregate
    Output: (SubPlan expr_1), ((SubPlan expr_3))
-   Group Key: ((SubPlan expr_3))
-   ->  Sort
-         Output: ((SubPlan expr_3)), i1.q1
-         Sort Key: ((SubPlan expr_3))
-         ->  Seq Scan on public.int8_tbl i1
-               Output: (SubPlan expr_3), i1.q1
-               SubPlan expr_3
-                 ->  Result
-                       Output: i1.q1
+   Group Key: (SubPlan expr_3)
+   ->  Seq Scan on public.int8_tbl i1
+         Output: (SubPlan expr_3), i1.q1
+         SubPlan expr_3
+           ->  Result
+                 Output: i1.q1
    SubPlan expr_1
      ->  Result
            Output: GROUPING((SubPlan expr_2))
-(14 rows)
+(11 rows)
 
 select (select grouping(ss.x))
 from int8_tbl i1
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index c30304b99c7..956abf9dc71 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -187,25 +187,19 @@ SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 O
  Sort
    Sort Key: pagg_tab.c, (sum(pagg_tab.a)), (avg(pagg_tab.b))
    ->  Append
-         ->  GroupAggregate
+         ->  IndexAggregate
                Group Key: pagg_tab.c
                Filter: (avg(pagg_tab.d) < '15'::numeric)
-               ->  Sort
-                     Sort Key: pagg_tab.c
-                     ->  Seq Scan on pagg_tab_p1 pagg_tab
-         ->  GroupAggregate
+               ->  Seq Scan on pagg_tab_p1 pagg_tab
+         ->  IndexAggregate
                Group Key: pagg_tab_1.c
                Filter: (avg(pagg_tab_1.d) < '15'::numeric)
-               ->  Sort
-                     Sort Key: pagg_tab_1.c
-                     ->  Seq Scan on pagg_tab_p2 pagg_tab_1
-         ->  GroupAggregate
+               ->  Seq Scan on pagg_tab_p2 pagg_tab_1
+         ->  IndexAggregate
                Group Key: pagg_tab_2.c
                Filter: (avg(pagg_tab_2.d) < '15'::numeric)
-               ->  Sort
-                     Sort Key: pagg_tab_2.c
-                     ->  Seq Scan on pagg_tab_p3 pagg_tab_2
-(21 rows)
+               ->  Seq Scan on pagg_tab_p3 pagg_tab_2
+(15 rows)
 
 SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
   c   | sum  |         avg         | count 
@@ -221,31 +215,18 @@ SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 O
 -- When GROUP BY clause does not match; partial aggregation is performed for each partition.
 EXPLAIN (COSTS OFF)
 SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
-                            QUERY PLAN                            
-------------------------------------------------------------------
+                          QUERY PLAN                          
+--------------------------------------------------------------
  Sort
    Sort Key: pagg_tab.a, (sum(pagg_tab.b)), (avg(pagg_tab.b))
-   ->  Finalize GroupAggregate
+   ->  IndexAggregate
          Group Key: pagg_tab.a
          Filter: (avg(pagg_tab.d) < '15'::numeric)
-         ->  Merge Append
-               Sort Key: pagg_tab.a
-               ->  Partial GroupAggregate
-                     Group Key: pagg_tab.a
-                     ->  Sort
-                           Sort Key: pagg_tab.a
-                           ->  Seq Scan on pagg_tab_p1 pagg_tab
-               ->  Partial GroupAggregate
-                     Group Key: pagg_tab_1.a
-                     ->  Sort
-                           Sort Key: pagg_tab_1.a
-                           ->  Seq Scan on pagg_tab_p2 pagg_tab_1
-               ->  Partial GroupAggregate
-                     Group Key: pagg_tab_2.a
-                     ->  Sort
-                           Sort Key: pagg_tab_2.a
-                           ->  Seq Scan on pagg_tab_p3 pagg_tab_2
-(22 rows)
+         ->  Append
+               ->  Seq Scan on pagg_tab_p1 pagg_tab_1
+               ->  Seq Scan on pagg_tab_p2 pagg_tab_2
+               ->  Seq Scan on pagg_tab_p3 pagg_tab_3
+(9 rows)
 
 SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
  a  | sum  |         avg         | count 
@@ -267,24 +248,19 @@ EXPLAIN (COSTS OFF)
 SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
                       QUERY PLAN                      
 ------------------------------------------------------
- Merge Append
+ Sort
    Sort Key: pagg_tab.c
-   ->  Group
-         Group Key: pagg_tab.c
-         ->  Sort
-               Sort Key: pagg_tab.c
+   ->  Append
+         ->  IndexAggregate
+               Group Key: pagg_tab.c
                ->  Seq Scan on pagg_tab_p1 pagg_tab
-   ->  Group
-         Group Key: pagg_tab_1.c
-         ->  Sort
-               Sort Key: pagg_tab_1.c
+         ->  IndexAggregate
+               Group Key: pagg_tab_1.c
                ->  Seq Scan on pagg_tab_p2 pagg_tab_1
-   ->  Group
-         Group Key: pagg_tab_2.c
-         ->  Sort
-               Sort Key: pagg_tab_2.c
+         ->  IndexAggregate
+               Group Key: pagg_tab_2.c
                ->  Seq Scan on pagg_tab_p3 pagg_tab_2
-(17 rows)
+(12 rows)
 
 SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
   c   
@@ -305,31 +281,18 @@ SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
 
 EXPLAIN (COSTS OFF)
 SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
-                         QUERY PLAN                         
-------------------------------------------------------------
- Group
+                   QUERY PLAN                   
+------------------------------------------------
+ IndexAggregate
    Group Key: pagg_tab.a
-   ->  Merge Append
-         Sort Key: pagg_tab.a
-         ->  Group
-               Group Key: pagg_tab.a
-               ->  Sort
-                     Sort Key: pagg_tab.a
-                     ->  Seq Scan on pagg_tab_p1 pagg_tab
-                           Filter: (a < 3)
-         ->  Group
-               Group Key: pagg_tab_1.a
-               ->  Sort
-                     Sort Key: pagg_tab_1.a
-                     ->  Seq Scan on pagg_tab_p2 pagg_tab_1
-                           Filter: (a < 3)
-         ->  Group
-               Group Key: pagg_tab_2.a
-               ->  Sort
-                     Sort Key: pagg_tab_2.a
-                     ->  Seq Scan on pagg_tab_p3 pagg_tab_2
-                           Filter: (a < 3)
-(22 rows)
+   ->  Append
+         ->  Seq Scan on pagg_tab_p1 pagg_tab_1
+               Filter: (a < 3)
+         ->  Seq Scan on pagg_tab_p2 pagg_tab_2
+               Filter: (a < 3)
+         ->  Seq Scan on pagg_tab_p3 pagg_tab_3
+               Filter: (a < 3)
+(9 rows)
 
 SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
  a 
@@ -345,24 +308,19 @@ SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;
                          QUERY PLAN                         
 ------------------------------------------------------------
  Limit
-   ->  Merge Append
+   ->  Sort
          Sort Key: pagg_tab.c
-         ->  GroupAggregate
-               Group Key: pagg_tab.c
-               ->  Sort
-                     Sort Key: pagg_tab.c
+         ->  Append
+               ->  IndexAggregate
+                     Group Key: pagg_tab.c
                      ->  Seq Scan on pagg_tab_p1 pagg_tab
-         ->  GroupAggregate
-               Group Key: pagg_tab_1.c
-               ->  Sort
-                     Sort Key: pagg_tab_1.c
+               ->  IndexAggregate
+                     Group Key: pagg_tab_1.c
                      ->  Seq Scan on pagg_tab_p2 pagg_tab_1
-         ->  GroupAggregate
-               Group Key: pagg_tab_2.c
-               ->  Sort
-                     Sort Key: pagg_tab_2.c
+               ->  IndexAggregate
+                     Group Key: pagg_tab_2.c
                      ->  Seq Scan on pagg_tab_p3 pagg_tab_2
-(18 rows)
+(13 rows)
 
 SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;
  count 
@@ -556,43 +514,30 @@ SELECT t2.y, sum(t1.y), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2
 SET enable_hashagg TO false;
 EXPLAIN (COSTS OFF)
 SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y HAVING avg(t1.x) > 10 ORDER BY 1, 2, 3;
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                         QUERY PLAN                          
+-------------------------------------------------------------
  Sort
    Sort Key: t1.y, (sum(t1.x)), (count(*))
-   ->  Finalize GroupAggregate
+   ->  IndexAggregate
          Group Key: t1.y
          Filter: (avg(t1.x) > '10'::numeric)
-         ->  Merge Append
-               Sort Key: t1.y
-               ->  Partial GroupAggregate
-                     Group Key: t1.y
-                     ->  Sort
-                           Sort Key: t1.y
-                           ->  Hash Join
-                                 Hash Cond: (t1.x = t2.y)
-                                 ->  Seq Scan on pagg_tab1_p1 t1
-                                 ->  Hash
-                                       ->  Seq Scan on pagg_tab2_p1 t2
-               ->  Partial GroupAggregate
-                     Group Key: t1_1.y
-                     ->  Sort
-                           Sort Key: t1_1.y
-                           ->  Hash Join
-                                 Hash Cond: (t1_1.x = t2_1.y)
-                                 ->  Seq Scan on pagg_tab1_p2 t1_1
-                                 ->  Hash
-                                       ->  Seq Scan on pagg_tab2_p2 t2_1
-               ->  Partial GroupAggregate
-                     Group Key: t1_2.y
-                     ->  Sort
-                           Sort Key: t1_2.y
-                           ->  Hash Join
-                                 Hash Cond: (t2_2.y = t1_2.x)
-                                 ->  Seq Scan on pagg_tab2_p3 t2_2
-                                 ->  Hash
-                                       ->  Seq Scan on pagg_tab1_p3 t1_2
-(34 rows)
+         ->  Append
+               ->  Hash Join
+                     Hash Cond: (t1_1.x = t2_1.y)
+                     ->  Seq Scan on pagg_tab1_p1 t1_1
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab2_p1 t2_1
+               ->  Hash Join
+                     Hash Cond: (t1_2.x = t2_2.y)
+                     ->  Seq Scan on pagg_tab1_p2 t1_2
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab2_p2 t2_2
+               ->  Hash Join
+                     Hash Cond: (t2_3.y = t1_3.x)
+                     ->  Seq Scan on pagg_tab2_p3 t2_3
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab1_p3 t1_3
+(21 rows)
 
 SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y HAVING avg(t1.x) > 10 ORDER BY 1, 2, 3;
  y  | sum  | count 
@@ -839,16 +784,14 @@ SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a FULL JOI
 -- Empty join relation because of empty outer side, no partitionwise agg plan
 EXPLAIN (COSTS OFF)
 SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x, a.y ORDER BY 1, 2;
-                  QUERY PLAN                  
-----------------------------------------------
- GroupAggregate
+               QUERY PLAN               
+----------------------------------------
+ IndexAggregate
    Group Key: pagg_tab1.y
-   ->  Sort
-         Sort Key: pagg_tab1.y
-         ->  Result
-               Replaces: Join on b, pagg_tab1
-               One-Time Filter: false
-(7 rows)
+   ->  Result
+         Replaces: Join on b, pagg_tab1
+         One-Time Filter: false
+(5 rows)
 
 SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x, a.y ORDER BY 1, 2;
  x | y | count 
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 933921d1860..aee7469dc1e 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -881,20 +881,16 @@ select * from
   (select string4, count(unique2)
    from tenk1 group by string4 order by string4) ss
   right join (values (1),(2),(3)) v(x) on true;
-                        QUERY PLAN                        
-----------------------------------------------------------
+                  QUERY PLAN                  
+----------------------------------------------
  Nested Loop Left Join
    ->  Values Scan on "*VALUES*"
-   ->  Finalize GroupAggregate
+   ->  IndexAggregate
          Group Key: tenk1.string4
-         ->  Gather Merge
+         ->  Gather
                Workers Planned: 4
-               ->  Partial GroupAggregate
-                     Group Key: tenk1.string4
-                     ->  Sort
-                           Sort Key: tenk1.string4
-                           ->  Parallel Seq Scan on tenk1
-(11 rows)
+               ->  Parallel Seq Scan on tenk1
+(7 rows)
 
 select * from
   (select string4, count(unique2)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 0411db832f1..d32bec316d3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -157,6 +157,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashagg                 | on
  enable_hashjoin                | on
  enable_incremental_sort        | on
+ enable_indexagg                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
  enable_material                | on
@@ -173,7 +174,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(25 rows)
+(26 rows)
 
 -- There are always wait event descriptions for various types.  InjectionPoint
 -- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 850f5a5787f..f72eb367112 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1392,6 +1392,7 @@ CREATE INDEX btg_x_y_idx ON btg(x, y);
 ANALYZE btg;
 
 SET enable_hashagg = off;
+SET enable_indexagg = off;
 SET enable_seqscan = off;
 
 -- Utilize the ordering of index scan to avoid a Sort operation
@@ -1623,12 +1624,100 @@ select v||'a', case v||'a' when 'aa' then 1 else 0 end, count(*)
 select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
   from unnest(array['a','b']) u(v)
  group by v||'a' order by 1;
+ 
+--
+-- Index Aggregation tests
+--
+
+set enable_hashagg = false;
+set enable_sort = false;
+set enable_indexagg = true;
+set enable_indexscan = false;
+
+-- require ordered output
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+-- multiple grouping columns
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+
+-- mixing columns between group by and order by
+begin;
+
+create temp table tmp(x int, y int);
+insert into tmp values (1, 8), (2, 7), (3, 6), (4, 5);
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+
+--
+-- Index Aggregation Spill tests
+--
+
+set enable_indexagg = true;
+set enable_sort=false;
+set enable_hashagg = false;
+set work_mem='64kB';
+
+select unique1, count(*), sum(twothousand) from tenk1
+group by unique1
+having sum(fivethous) > 4975
+order by sum(twothousand);
+
+set work_mem to default;
+set enable_sort to default;
+set enable_hashagg to default;
+set enable_indexagg to default;
 
 --
 -- Hash Aggregation Spill tests
 --
 
 set enable_sort=false;
+set enable_indexagg = false;
 set work_mem='64kB';
 
 select unique1, count(*), sum(twothousand) from tenk1
@@ -1657,6 +1746,7 @@ analyze agg_data_20k;
 -- Produce results with sorting.
 
 set enable_hashagg = false;
+set enable_indexagg = false;
 
 set jit_above_cost = 0;
 
@@ -1728,23 +1818,68 @@ select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
 set enable_sort = true;
 set work_mem to default;
 
+-- Produce results with index aggregation
+
+set enable_sort = false;
+set enable_hashagg = false;
+set enable_indexagg = true;
+
+set jit_above_cost = 0;
+
+explain (costs off)
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+
+create table agg_index_1 as
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+
+create table agg_index_2 as
+select * from
+  (values (100), (300), (500)) as r(a),
+  lateral (
+    select (g/2)::numeric as c1,
+           array_agg(g::numeric) as c2,
+	   count(*) as c3
+    from agg_data_2k
+    where g < r.a
+    group by g/2) as s;
+
+set jit_above_cost to default;
+
+create table agg_index_3 as
+select (g/2)::numeric as c1, sum(7::int4) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+
+create table agg_index_4 as
+select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+
 -- Compare group aggregation results to hash aggregation results
 
 (select * from agg_hash_1 except select * from agg_group_1)
   union all
-(select * from agg_group_1 except select * from agg_hash_1);
+(select * from agg_group_1 except select * from agg_hash_1)
+  union all
+(select * from agg_index_1 except select * from agg_group_1);
 
 (select * from agg_hash_2 except select * from agg_group_2)
   union all
-(select * from agg_group_2 except select * from agg_hash_2);
+(select * from agg_group_2 except select * from agg_hash_2)
+  union all
+(select * from agg_index_2 except select * from agg_group_2);
 
 (select * from agg_hash_3 except select * from agg_group_3)
   union all
-(select * from agg_group_3 except select * from agg_hash_3);
+(select * from agg_group_3 except select * from agg_hash_3)
+  union all
+(select * from agg_index_3 except select * from agg_group_3);
 
 (select * from agg_hash_4 except select * from agg_group_4)
   union all
-(select * from agg_group_4 except select * from agg_hash_4);
+(select * from agg_group_4 except select * from agg_hash_4)
+  union all
+(select * from agg_index_4 except select * from agg_group_4);
 
 drop table agg_group_1;
 drop table agg_group_2;
@@ -1754,3 +1889,7 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_index_1;
+drop table agg_index_2;
+drop table agg_index_3;
+drop table agg_index_4;
-- 
2.43.0

0002-introduce-AGG_INDEX-grouping-strategy-node.patchtext/x-patch; charset=UTF-8; name=0002-introduce-AGG_INDEX-grouping-strategy-node.patchDownload

From 2986764514f2310bfe2d1d7d2eacb4e4096e76f8 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 16:41:58 +0300
Subject: [PATCH 2/4] introduce AGG_INDEX grouping strategy node

AGG_INDEX is a new grouping strategy that builds in-memory index and use
it for grouping. The main advantage of this approach is that output is
ordered by grouping columns and if there are any ORDER BY specified,
then it will use this to build grouping/sorting columns.

For index it uses B+tree which was implemented in previous commit. And
overall it's implementation is very close to AGG_HASHED:

- maintain in-memory grouping structure
- track memory consuption
- if memory limit reached spill data to disk in batches (using hash of
  key columns)
- hash batches are processed one after another and for each batch fill
  new in-memory structure

For this reason many code logic is generalized to support both index and
hash implementations: function generalization using boolean arguments
(i.e. 'ishash'), rename spill logic members in AggState with prefix
'spill_' instead of 'hash_', etc.

Most differences are in spill logic: to preserve sort order in case of disk
spill we must dump all indexes to disk to create sorted runs and perform
final external merge.

One problem is external merge. It's adapted from tuplesort.c - introduce
new operational mode - tuplemerge (with it's own prefix). Internally we
just setup state accordingly and process as earlier without any
significant code changes.

Another problem is what tuples to save into sorted runs. We decided to
store tuples after projection (when it's aggregates are finalized),
because internal transition info is represented by value/isnull/novalue
tripple (in AggStatePerGroupData) which is quiet hard to serialize and
handle, but actually, after projection all group by attributes are
saved, so we can access them during merge. Also, projection applies
filter, so it can discard some tuples.
---
 src/backend/executor/execExpr.c            |   31 +-
 src/backend/executor/nodeAgg.c             | 1378 +++++++++++++++++---
 src/backend/utils/sort/tuplesort.c         |  209 ++-
 src/backend/utils/sort/tuplesortvariants.c |  105 ++
 src/include/executor/executor.h            |   10 +-
 src/include/executor/nodeAgg.h             |   33 +-
 src/include/nodes/execnodes.h              |   61 +-
 src/include/nodes/nodes.h                  |    1 +
 src/include/nodes/plannodes.h              |    2 +-
 src/include/utils/tuplesort.h              |   17 +-
 10 files changed, 1618 insertions(+), 229 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index c35744b105e..117d7ba31d0 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -94,7 +94,7 @@ static void ExecInitCoerceToDomain(ExprEvalStep *scratch, CoerceToDomain *ctest,
 static void ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 								  ExprEvalStep *scratch,
 								  FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
-								  int transno, int setno, int setoff, bool ishash,
+								  int transno, int setno, int setoff, int strategy,
 								  bool nullcheck);
 static void ExecInitJsonExpr(JsonExpr *jsexpr, ExprState *state,
 							 Datum *resv, bool *resnull,
@@ -3667,7 +3667,7 @@ ExecInitCoerceToDomain(ExprEvalStep *scratch, CoerceToDomain *ctest,
  */
 ExprState *
 ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
-				  bool doSort, bool doHash, bool nullcheck)
+				  int groupStrategy, bool nullcheck)
 {
 	ExprState  *state = makeNode(ExprState);
 	PlanState  *parent = &aggstate->ss.ps;
@@ -3925,7 +3925,7 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 		 * grouping set). Do so for both sort and hash based computations, as
 		 * applicable.
 		 */
-		if (doSort)
+		if (groupStrategy & GROUPING_STRATEGY_SORT)
 		{
 			int			processGroupingSets = Max(phase->numsets, 1);
 			int			setoff = 0;
@@ -3933,13 +3933,13 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 			for (int setno = 0; setno < processGroupingSets; setno++)
 			{
 				ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
-									  pertrans, transno, setno, setoff, false,
-									  nullcheck);
+									  pertrans, transno, setno, setoff,
+									  GROUPING_STRATEGY_SORT, nullcheck);
 				setoff++;
 			}
 		}
 
-		if (doHash)
+		if (groupStrategy & GROUPING_STRATEGY_HASH)
 		{
 			int			numHashes = aggstate->num_hashes;
 			int			setoff;
@@ -3953,12 +3953,19 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 			for (int setno = 0; setno < numHashes; setno++)
 			{
 				ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
-									  pertrans, transno, setno, setoff, true,
-									  nullcheck);
+									  pertrans, transno, setno, setoff,
+									  GROUPING_STRATEGY_HASH, nullcheck);
 				setoff++;
 			}
 		}
 
+		if (groupStrategy & GROUPING_STRATEGY_INDEX)
+		{
+			ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
+								  pertrans, transno, 0, 0,
+								  GROUPING_STRATEGY_INDEX, nullcheck);
+		}
+
 		/* adjust early bail out jump target(s) */
 		foreach(bail, adjust_bailout)
 		{
@@ -4011,16 +4018,18 @@ static void
 ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 					  ExprEvalStep *scratch,
 					  FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
-					  int transno, int setno, int setoff, bool ishash,
+					  int transno, int setno, int setoff, int strategy,
 					  bool nullcheck)
 {
 	ExprContext *aggcontext;
 	int			adjust_jumpnull = -1;
 
-	if (ishash)
+	if (strategy & GROUPING_STRATEGY_HASH)
 		aggcontext = aggstate->hashcontext;
-	else
+	else if (strategy & GROUPING_STRATEGY_SORT)
 		aggcontext = aggstate->aggcontexts[setno];
+	else
+		aggcontext = aggstate->indexcontext;
 
 	/* add check for NULL pointer? */
 	if (nullcheck)
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index a18556f62ec..c5c6b7bfce9 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -364,7 +364,7 @@ typedef struct FindColsContext
 	Bitmapset  *unaggregated;	/* other column references */
 } FindColsContext;
 
-static void select_current_set(AggState *aggstate, int setno, bool is_hash);
+static void select_current_set(AggState *aggstate, int setno, int strategy);
 static void initialize_phase(AggState *aggstate, int newphase);
 static TupleTableSlot *fetch_input_tuple(AggState *aggstate);
 static void initialize_aggregates(AggState *aggstate,
@@ -403,8 +403,8 @@ static void find_cols(AggState *aggstate, Bitmapset **aggregated,
 static bool find_cols_walker(Node *node, FindColsContext *context);
 static void build_hash_tables(AggState *aggstate);
 static void build_hash_table(AggState *aggstate, int setno, double nbuckets);
-static void hashagg_recompile_expressions(AggState *aggstate, bool minslot,
-										  bool nullcheck);
+static void agg_recompile_expressions(AggState *aggstate, bool minslot,
+									  bool nullcheck);
 static void hash_create_memory(AggState *aggstate);
 static double hash_choose_num_buckets(double hashentrysize,
 									  double ngroups, Size memory);
@@ -431,13 +431,13 @@ static HashAggBatch *hashagg_batch_new(LogicalTape *input_tape, int setno,
 									   int64 input_tuples, double input_card,
 									   int used_bits);
 static MinimalTuple hashagg_batch_read(HashAggBatch *batch, uint32 *hashp);
-static void hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset,
-							   int used_bits, double input_groups,
-							   double hashentrysize);
-static Size hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
-								TupleTableSlot *inputslot, uint32 hash);
-static void hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill,
-								 int setno);
+static void agg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset,
+						   int used_bits, double input_groups,
+						   double hashentrysize);
+static Size agg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
+							TupleTableSlot *inputslot, uint32 hash);
+static void agg_spill_finish(AggState *aggstate, HashAggSpill *spill,
+							 int setno);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
 static void build_pertrans_for_aggref(AggStatePerTrans pertrans,
 									  AggState *aggstate, EState *estate,
@@ -446,21 +446,27 @@ static void build_pertrans_for_aggref(AggStatePerTrans pertrans,
 									  Oid aggdeserialfn, Datum initValue,
 									  bool initValueIsNull, Oid *inputTypes,
 									  int numArguments);
-
+static void agg_fill_index(AggState *state);
+static TupleTableSlot *agg_retrieve_index(AggState *state);
+static void lookup_index_entries(AggState *state);
+static void indexagg_finish_initial_spills(AggState *aggstate);
+static void index_agg_enter_spill_mode(AggState *aggstate);
 
 /*
  * Select the current grouping set; affects current_set and
  * curaggcontext.
  */
 static void
-select_current_set(AggState *aggstate, int setno, bool is_hash)
+select_current_set(AggState *aggstate, int setno, int strategy)
 {
 	/*
 	 * When changing this, also adapt ExecAggPlainTransByVal() and
 	 * ExecAggPlainTransByRef().
 	 */
-	if (is_hash)
+	if (strategy == GROUPING_STRATEGY_HASH)
 		aggstate->curaggcontext = aggstate->hashcontext;
+	else if (strategy == GROUPING_STRATEGY_INDEX)
+		aggstate->curaggcontext = aggstate->indexcontext;
 	else
 		aggstate->curaggcontext = aggstate->aggcontexts[setno];
 
@@ -680,7 +686,7 @@ initialize_aggregates(AggState *aggstate,
 	{
 		AggStatePerGroup pergroup = pergroups[setno];
 
-		select_current_set(aggstate, setno, false);
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_SORT);
 
 		for (transno = 0; transno < numTrans; transno++)
 		{
@@ -1478,7 +1484,7 @@ build_hash_tables(AggState *aggstate)
 			continue;
 		}
 
-		memory = aggstate->hash_mem_limit / aggstate->num_hashes;
+		memory = aggstate->spill_mem_limit / aggstate->num_hashes;
 
 		/* choose reasonable number of buckets per hashtable */
 		nbuckets = hash_choose_num_buckets(aggstate->hashentrysize,
@@ -1496,7 +1502,7 @@ build_hash_tables(AggState *aggstate)
 		build_hash_table(aggstate, setno, nbuckets);
 	}
 
-	aggstate->hash_ngroups_current = 0;
+	aggstate->spill_ngroups_current = 0;
 }
 
 /*
@@ -1728,7 +1734,7 @@ hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
 }
 
 /*
- * hashagg_recompile_expressions()
+ * agg_recompile_expressions()
  *
  * Identifies the right phase, compiles the right expression given the
  * arguments, and then sets phase->evalfunc to that expression.
@@ -1746,34 +1752,47 @@ hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
  * expressions in the AggStatePerPhase, and reuse when appropriate.
  */
 static void
-hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
+agg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
 {
 	AggStatePerPhase phase;
 	int			i = minslot ? 1 : 0;
 	int			j = nullcheck ? 1 : 0;
 
 	Assert(aggstate->aggstrategy == AGG_HASHED ||
-		   aggstate->aggstrategy == AGG_MIXED);
+		   aggstate->aggstrategy == AGG_MIXED ||
+		   aggstate->aggstrategy == AGG_INDEX);
 
-	if (aggstate->aggstrategy == AGG_HASHED)
-		phase = &aggstate->phases[0];
-	else						/* AGG_MIXED */
+	if (aggstate->aggstrategy == AGG_MIXED)
 		phase = &aggstate->phases[1];
+	else						/* AGG_HASHED or AGG_INDEX */
+		phase = &aggstate->phases[0];
 
 	if (phase->evaltrans_cache[i][j] == NULL)
 	{
 		const TupleTableSlotOps *outerops = aggstate->ss.ps.outerops;
 		bool		outerfixed = aggstate->ss.ps.outeropsfixed;
-		bool		dohash = true;
-		bool		dosort = false;
+		int			strategy = 0;
 
-		/*
-		 * If minslot is true, that means we are processing a spilled batch
-		 * (inside agg_refill_hash_table()), and we must not advance the
-		 * sorted grouping sets.
-		 */
-		if (aggstate->aggstrategy == AGG_MIXED && !minslot)
-			dosort = true;
+		switch (aggstate->aggstrategy)
+		{
+			case AGG_MIXED:
+				/*
+				 * If minslot is true, that means we are processing a spilled batch
+				 * (inside agg_refill_hash_table()), and we must not advance the
+				 * sorted grouping sets.
+				 */
+				if (!minslot)
+					strategy |= GROUPING_STRATEGY_SORT;
+				/* FALLTHROUGH */
+			case AGG_HASHED:
+				strategy |= GROUPING_STRATEGY_HASH;
+				break;
+			case AGG_INDEX:
+				strategy |= GROUPING_STRATEGY_INDEX;
+				break;	
+			default:
+				Assert(false);
+		}
 
 		/* temporarily change the outerops while compiling the expression */
 		if (minslot)
@@ -1783,8 +1802,7 @@ hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
 		}
 
 		phase->evaltrans_cache[i][j] = ExecBuildAggTrans(aggstate, phase,
-														 dosort, dohash,
-														 nullcheck);
+														 strategy, nullcheck);
 
 		/* change back */
 		aggstate->ss.ps.outerops = outerops;
@@ -1803,9 +1821,9 @@ hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
  * substantially larger than the initial value.
  */
 void
-hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
-					Size *mem_limit, uint64 *ngroups_limit,
-					int *num_partitions)
+agg_set_limits(double hashentrysize, double input_groups, int used_bits,
+			   Size *mem_limit, uint64 *ngroups_limit,
+			   int *num_partitions)
 {
 	int			npartitions;
 	Size		partition_mem;
@@ -1853,6 +1871,18 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
 		*ngroups_limit = 1;
 }
 
+static inline bool
+agg_spill_required(AggState *aggstate, Size total_mem)
+{
+	/*
+	 * Don't spill unless there's at least one group in the hash table so we
+	 * can be sure to make progress even in edge cases.
+	 */
+	return aggstate->spill_ngroups_current > 0 &&
+			(total_mem > aggstate->spill_mem_limit ||
+			 aggstate->spill_ngroups_current > aggstate->spill_ngroups_limit);
+}
+
 /*
  * hash_agg_check_limits
  *
@@ -1863,7 +1893,6 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
 static void
 hash_agg_check_limits(AggState *aggstate)
 {
-	uint64		ngroups = aggstate->hash_ngroups_current;
 	Size		meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt,
 													 true);
 	Size		entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt,
@@ -1874,7 +1903,7 @@ hash_agg_check_limits(AggState *aggstate)
 	bool		do_spill = false;
 
 #ifdef USE_INJECTION_POINTS
-	if (ngroups >= 1000)
+	if (aggstate->spill_ngroups_current >= 1000)
 	{
 		if (IS_INJECTION_POINT_ATTACHED("hash-aggregate-spill-1000"))
 		{
@@ -1888,9 +1917,7 @@ hash_agg_check_limits(AggState *aggstate)
 	 * Don't spill unless there's at least one group in the hash table so we
 	 * can be sure to make progress even in edge cases.
 	 */
-	if (aggstate->hash_ngroups_current > 0 &&
-		(total_mem > aggstate->hash_mem_limit ||
-		 ngroups > aggstate->hash_ngroups_limit))
+	if (agg_spill_required(aggstate, total_mem))
 	{
 		do_spill = true;
 	}
@@ -1899,97 +1926,199 @@ hash_agg_check_limits(AggState *aggstate)
 		hash_agg_enter_spill_mode(aggstate);
 }
 
+static void
+index_agg_check_limits(AggState *aggstate)
+{
+	Size		meta_mem = MemoryContextMemAllocated(aggstate->index_metacxt,
+													 true);
+	Size		node_mem = MemoryContextMemAllocated(aggstate->index_nodecxt,
+													 true);
+	Size		entry_mem = MemoryContextMemAllocated(aggstate->index_entrycxt,
+													  true);
+	Size		tval_mem = MemoryContextMemAllocated(aggstate->indexcontext->ecxt_per_tuple_memory,
+													 true);
+	Size		total_mem = meta_mem + node_mem + entry_mem + tval_mem;
+	bool		do_spill = false;
+
+#ifdef USE_INJECTION_POINTS
+	if (aggstate->spill_ngroups_current >= 1000)
+	{
+		if (IS_INJECTION_POINT_ATTACHED("index-aggregate-spill-1000"))
+		{
+			do_spill = true;
+			INJECTION_POINT_CACHED("index-aggregate-spill-1000", NULL);
+		}
+	}
+#endif
+
+	if (agg_spill_required(aggstate, total_mem))
+	{
+		do_spill = true;
+	}
+
+	if (do_spill)
+		index_agg_enter_spill_mode(aggstate);
+}
+
 /*
  * Enter "spill mode", meaning that no new groups are added to any of the hash
  * tables. Tuples that would create a new group are instead spilled, and
  * processed later.
  */
-static void
-hash_agg_enter_spill_mode(AggState *aggstate)
+static inline void
+agg_enter_spill_mode(AggState *aggstate, bool ishash)
 {
-	INJECTION_POINT("hash-aggregate-enter-spill-mode", NULL);
-	aggstate->hash_spill_mode = true;
-	hashagg_recompile_expressions(aggstate, aggstate->table_filled, true);
-
-	if (!aggstate->hash_ever_spilled)
+	if (ishash)
 	{
-		Assert(aggstate->hash_tapeset == NULL);
-		Assert(aggstate->hash_spills == NULL);
-
-		aggstate->hash_ever_spilled = true;
-
-		aggstate->hash_tapeset = LogicalTapeSetCreate(true, NULL, -1);
+		INJECTION_POINT("hash-aggregate-enter-spill-mode", NULL);
+		aggstate->spill_mode = true;
+		agg_recompile_expressions(aggstate, aggstate->table_filled, true);	
+	}
+	else
+	{
+		INJECTION_POINT("index-aggregate-enter-spill-mode", NULL);
+		aggstate->spill_mode = true;
+		agg_recompile_expressions(aggstate, aggstate->index_filled, true);
+	}
+
+	if (!aggstate->spill_ever_happened)
+	{
+		Assert(aggstate->spill_tapeset == NULL);
+		Assert(aggstate->spills == NULL);
 
-		aggstate->hash_spills = palloc_array(HashAggSpill, aggstate->num_hashes);
+		aggstate->spill_ever_happened = true;
+		aggstate->spill_tapeset = LogicalTapeSetCreate(true, NULL, -1);
 
-		for (int setno = 0; setno < aggstate->num_hashes; setno++)
+		if (ishash)
 		{
-			AggStatePerHash perhash = &aggstate->perhash[setno];
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
-
-			hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
+			aggstate->spills = palloc_array(HashAggSpill, aggstate->num_hashes);
+
+			for (int setno = 0; setno < aggstate->num_hashes; setno++)
+			{
+				AggStatePerHash perhash = &aggstate->perhash[setno];
+				HashAggSpill *spill = &aggstate->spills[setno];
+
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
 							   perhash->aggnode->numGroups,
 							   aggstate->hashentrysize);
+			}
+		}
+		else
+		{
+			aggstate->spills = palloc(sizeof(HashAggSpill));
+			agg_spill_init(aggstate->spills, aggstate->spill_tapeset, 0,
+						   aggstate->perindex->aggnode->numGroups,
+						   aggstate->hashentrysize);
 		}
 	}
 }
 
+static void
+hash_agg_enter_spill_mode(AggState *aggstate)
+{
+	agg_enter_spill_mode(aggstate, true);
+}
+
+static void
+index_agg_enter_spill_mode(AggState *aggstate)
+{
+	agg_enter_spill_mode(aggstate, false);
+}
+
 /*
  * Update metrics after filling the hash table.
  *
  * If reading from the outer plan, from_tape should be false; if reading from
  * another tape, from_tape should be true.
  */
-static void
-hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+static inline void
+agg_update_spill_metrics(AggState *aggstate, bool from_tape, int npartitions, bool ishash)
 {
 	Size		meta_mem;
 	Size		entry_mem;
-	Size		hashkey_mem;
+	Size		key_mem;
 	Size		buffer_mem;
 	Size		total_mem;
 
 	if (aggstate->aggstrategy != AGG_MIXED &&
-		aggstate->aggstrategy != AGG_HASHED)
+		aggstate->aggstrategy != AGG_HASHED &&
+		aggstate->aggstrategy != AGG_INDEX)
 		return;
 
-	/* memory for the hash table itself */
-	meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt, true);
-
-	/* memory for hash entries */
-	entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt, true);
-
-	/* memory for byref transition states */
-	hashkey_mem = MemoryContextMemAllocated(aggstate->hashcontext->ecxt_per_tuple_memory, true);
-
+		if (ishash)
+		{
+			/* memory for the hash table itself */
+			meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt, true);
+			
+			/* memory for hash entries */
+			entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt, true);
+			
+			/* memory for byref transition states */
+			key_mem = MemoryContextMemAllocated(aggstate->hashcontext->ecxt_per_tuple_memory, true);
+		}
+		else
+		{
+			/* memory for the index itself */
+			meta_mem = MemoryContextMemAllocated(aggstate->index_metacxt, true);
+			
+			/* memory for the index nodes */
+			meta_mem += MemoryContextMemAllocated(aggstate->index_nodecxt, true);
+			
+			/* memory for index entries */
+			entry_mem = MemoryContextMemAllocated(aggstate->index_entrycxt, true);
+
+			/* memory for byref transition states */
+			key_mem = MemoryContextMemAllocated(aggstate->indexcontext->ecxt_per_tuple_memory, true);
+		}
 	/* memory for read/write tape buffers, if spilled */
 	buffer_mem = npartitions * HASHAGG_WRITE_BUFFER_SIZE;
 	if (from_tape)
 		buffer_mem += HASHAGG_READ_BUFFER_SIZE;
 
 	/* update peak mem */
-	total_mem = meta_mem + entry_mem + hashkey_mem + buffer_mem;
-	if (total_mem > aggstate->hash_mem_peak)
-		aggstate->hash_mem_peak = total_mem;
+	total_mem = meta_mem + entry_mem + key_mem + buffer_mem;
+	if (total_mem > aggstate->spill_mem_peak)
+		aggstate->spill_mem_peak = total_mem;
 
 	/* update disk usage */
-	if (aggstate->hash_tapeset != NULL)
+	if (aggstate->spill_tapeset != NULL)
 	{
-		uint64		disk_used = LogicalTapeSetBlocks(aggstate->hash_tapeset) * (BLCKSZ / 1024);
+		uint64		disk_used = LogicalTapeSetBlocks(aggstate->spill_tapeset) * (BLCKSZ / 1024);
 
-		if (aggstate->hash_disk_used < disk_used)
-			aggstate->hash_disk_used = disk_used;
+		if (aggstate->spill_disk_used < disk_used)
+			aggstate->spill_disk_used = disk_used;
 	}
 
 	/* update hashentrysize estimate based on contents */
-	if (aggstate->hash_ngroups_current > 0)
+	if (aggstate->spill_ngroups_current > 0)
 	{
-		aggstate->hashentrysize =
-			TupleHashEntrySize() +
-			(hashkey_mem / (double) aggstate->hash_ngroups_current);
+		if (ishash)
+		{
+			aggstate->hashentrysize =
+				TupleHashEntrySize() +
+				(key_mem / (double) aggstate->spill_ngroups_current);
+		}
+		else
+		{
+			/* index stores MinimalTuples directly without any wrapper */
+			aggstate->hashentrysize = 
+				(key_mem / (double) aggstate->spill_ngroups_current);
+		}
 	}
 }
 
+static void
+hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+{
+	agg_update_spill_metrics(aggstate, from_tape, npartitions, true);
+}
+
+static void
+index_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+{
+	agg_update_spill_metrics(aggstate, from_tape, npartitions, false);
+}
+
 /*
  * Create memory contexts used for hash aggregation.
  */
@@ -2048,6 +2177,33 @@ hash_create_memory(AggState *aggstate)
 
 }
 
+/*
+ * Create memory contexts used for index aggregation.
+ */
+static void
+index_create_memory(AggState *aggstate)
+{
+	Size maxBlockSize = ALLOCSET_DEFAULT_MAXSIZE;
+	
+	aggstate->indexcontext = CreateWorkExprContext(aggstate->ss.ps.state);
+	
+	aggstate->index_metacxt = AllocSetContextCreate(aggstate->ss.ps.state->es_query_cxt,
+													"IndexAgg meta context",
+													ALLOCSET_DEFAULT_SIZES);
+	aggstate->index_nodecxt = BumpContextCreate(aggstate->ss.ps.state->es_query_cxt,
+												"IndexAgg node context",
+												ALLOCSET_SMALL_SIZES);
+
+	maxBlockSize = pg_prevpower2_size_t(work_mem * (Size) 1024 / 16);
+	maxBlockSize = Min(maxBlockSize, ALLOCSET_DEFAULT_MAXSIZE);
+	maxBlockSize = Max(maxBlockSize, ALLOCSET_DEFAULT_INITSIZE);
+	aggstate->index_entrycxt = AllocSetContextCreate(aggstate->ss.ps.state->es_query_cxt,
+												"IndexAgg table context",
+												ALLOCSET_DEFAULT_MINSIZE,
+												ALLOCSET_DEFAULT_INITSIZE,
+												maxBlockSize);
+}
+
 /*
  * Choose a reasonable number of buckets for the initial hash table size.
  */
@@ -2141,7 +2297,7 @@ initialize_hash_entry(AggState *aggstate, TupleHashTable hashtable,
 	AggStatePerGroup pergroup;
 	int			transno;
 
-	aggstate->hash_ngroups_current++;
+	aggstate->spill_ngroups_current++;
 	hash_agg_check_limits(aggstate);
 
 	/* no need to allocate or initialize per-group state */
@@ -2196,9 +2352,9 @@ lookup_hash_entries(AggState *aggstate)
 		bool	   *p_isnew;
 
 		/* if hash table already spilled, don't create new entries */
-		p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
 
-		select_current_set(aggstate, setno, true);
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_HASH);
 		prepare_hash_slot(perhash,
 						  outerslot,
 						  hashslot);
@@ -2214,15 +2370,15 @@ lookup_hash_entries(AggState *aggstate)
 		}
 		else
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 			TupleTableSlot *slot = aggstate->tmpcontext->ecxt_outertuple;
 
 			if (spill->partitions == NULL)
-				hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
-								   perhash->aggnode->numGroups,
-								   aggstate->hashentrysize);
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
+							   perhash->aggnode->numGroups,
+							   aggstate->hashentrysize);
 
-			hashagg_spill_tuple(aggstate, spill, slot, hash);
+			agg_spill_tuple(aggstate, spill, slot, hash);
 			pergroup[setno] = NULL;
 		}
 	}
@@ -2265,6 +2421,12 @@ ExecAgg(PlanState *pstate)
 			case AGG_SORTED:
 				result = agg_retrieve_direct(node);
 				break;
+			case AGG_INDEX:
+				if (!node->index_filled)
+					agg_fill_index(node);
+
+				result = agg_retrieve_index(node);
+				break;
 		}
 
 		if (!TupIsNull(result))
@@ -2381,7 +2543,7 @@ agg_retrieve_direct(AggState *aggstate)
 				aggstate->table_filled = true;
 				ResetTupleHashIterator(aggstate->perhash[0].hashtable,
 									   &aggstate->perhash[0].hashiter);
-				select_current_set(aggstate, 0, true);
+				select_current_set(aggstate, 0, GROUPING_STRATEGY_HASH);
 				return agg_retrieve_hash_table(aggstate);
 			}
 			else
@@ -2601,7 +2763,7 @@ agg_retrieve_direct(AggState *aggstate)
 
 		prepare_projection_slot(aggstate, econtext->ecxt_outertuple, currentSet);
 
-		select_current_set(aggstate, currentSet, false);
+		select_current_set(aggstate, currentSet, GROUPING_STRATEGY_SORT);
 
 		finalize_aggregates(aggstate,
 							peragg,
@@ -2683,19 +2845,19 @@ agg_refill_hash_table(AggState *aggstate)
 	HashAggBatch *batch;
 	AggStatePerHash perhash;
 	HashAggSpill spill;
-	LogicalTapeSet *tapeset = aggstate->hash_tapeset;
+	LogicalTapeSet *tapeset = aggstate->spill_tapeset;
 	bool		spill_initialized = false;
 
-	if (aggstate->hash_batches == NIL)
+	if (aggstate->spill_batches == NIL)
 		return false;
 
 	/* hash_batches is a stack, with the top item at the end of the list */
-	batch = llast(aggstate->hash_batches);
-	aggstate->hash_batches = list_delete_last(aggstate->hash_batches);
+	batch = llast(aggstate->spill_batches);
+	aggstate->spill_batches = list_delete_last(aggstate->spill_batches);
 
-	hash_agg_set_limits(aggstate->hashentrysize, batch->input_card,
-						batch->used_bits, &aggstate->hash_mem_limit,
-						&aggstate->hash_ngroups_limit, NULL);
+	agg_set_limits(aggstate->hashentrysize, batch->input_card,
+				   batch->used_bits, &aggstate->spill_mem_limit,
+				   &aggstate->spill_ngroups_limit, NULL);
 
 	/*
 	 * Each batch only processes one grouping set; set the rest to NULL so
@@ -2712,7 +2874,7 @@ agg_refill_hash_table(AggState *aggstate)
 	for (int setno = 0; setno < aggstate->num_hashes; setno++)
 		ResetTupleHashTable(aggstate->perhash[setno].hashtable);
 
-	aggstate->hash_ngroups_current = 0;
+	aggstate->spill_ngroups_current = 0;
 
 	/*
 	 * In AGG_MIXED mode, hash aggregation happens in phase 1 and the output
@@ -2726,7 +2888,7 @@ agg_refill_hash_table(AggState *aggstate)
 		aggstate->phase = &aggstate->phases[aggstate->current_phase];
 	}
 
-	select_current_set(aggstate, batch->setno, true);
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_HASH);
 
 	perhash = &aggstate->perhash[aggstate->current_set];
 
@@ -2737,19 +2899,19 @@ agg_refill_hash_table(AggState *aggstate)
 	 * We still need the NULL check, because we are only processing one
 	 * grouping set at a time and the rest will be NULL.
 	 */
-	hashagg_recompile_expressions(aggstate, true, true);
+	agg_recompile_expressions(aggstate, true, true);
 
 	INJECTION_POINT("hash-aggregate-process-batch", NULL);
 	for (;;)
 	{
-		TupleTableSlot *spillslot = aggstate->hash_spill_rslot;
+		TupleTableSlot *spillslot = aggstate->spill_rslot;
 		TupleTableSlot *hashslot = perhash->hashslot;
 		TupleHashTable hashtable = perhash->hashtable;
 		TupleHashEntry entry;
 		MinimalTuple tuple;
 		uint32		hash;
 		bool		isnew = false;
-		bool	   *p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
+		bool	   *p_isnew = aggstate->spill_mode ? NULL : &isnew;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2782,11 +2944,11 @@ agg_refill_hash_table(AggState *aggstate)
 				 * that we don't assign tapes that will never be used.
 				 */
 				spill_initialized = true;
-				hashagg_spill_init(&spill, tapeset, batch->used_bits,
-								   batch->input_card, aggstate->hashentrysize);
+				agg_spill_init(&spill, tapeset, batch->used_bits,
+							   batch->input_card, aggstate->hashentrysize);
 			}
 			/* no memory for a new group, spill */
-			hashagg_spill_tuple(aggstate, &spill, spillslot, hash);
+			agg_spill_tuple(aggstate, &spill, spillslot, hash);
 
 			aggstate->hash_pergroup[batch->setno] = NULL;
 		}
@@ -2806,16 +2968,16 @@ agg_refill_hash_table(AggState *aggstate)
 
 	if (spill_initialized)
 	{
-		hashagg_spill_finish(aggstate, &spill, batch->setno);
+		agg_spill_finish(aggstate, &spill, batch->setno);
 		hash_agg_update_metrics(aggstate, true, spill.npartitions);
 	}
 	else
 		hash_agg_update_metrics(aggstate, true, 0);
 
-	aggstate->hash_spill_mode = false;
+	aggstate->spill_mode = false;
 
 	/* prepare to walk the first hash table */
-	select_current_set(aggstate, batch->setno, true);
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_HASH);
 	ResetTupleHashIterator(aggstate->perhash[batch->setno].hashtable,
 						   &aggstate->perhash[batch->setno].hashiter);
 
@@ -2975,14 +3137,14 @@ agg_retrieve_hash_table_in_memory(AggState *aggstate)
 }
 
 /*
- * hashagg_spill_init
+ * agg_spill_init
  *
  * Called after we determined that spilling is necessary. Chooses the number
  * of partitions to create, and initializes them.
  */
 static void
-hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
-				   double input_groups, double hashentrysize)
+agg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
+			   double input_groups, double hashentrysize)
 {
 	int			npartitions;
 	int			partition_bits;
@@ -3018,14 +3180,13 @@ hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
 }
 
 /*
- * hashagg_spill_tuple
+ * agg_spill_tuple
  *
- * No room for new groups in the hash table. Save for later in the appropriate
- * partition.
+ * No room for new groups in memory. Save for later in the appropriate partition.
  */
 static Size
-hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
-					TupleTableSlot *inputslot, uint32 hash)
+agg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
+				TupleTableSlot *inputslot, uint32 hash)
 {
 	TupleTableSlot *spillslot;
 	int			partition;
@@ -3039,7 +3200,7 @@ hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
 	/* spill only attributes that we actually need */
 	if (!aggstate->all_cols_needed)
 	{
-		spillslot = aggstate->hash_spill_wslot;
+		spillslot = aggstate->spill_wslot;
 		slot_getsomeattrs(inputslot, aggstate->max_colno_needed);
 		ExecClearTuple(spillslot);
 		for (int i = 0; i < spillslot->tts_tupleDescriptor->natts; i++)
@@ -3167,14 +3328,14 @@ hashagg_finish_initial_spills(AggState *aggstate)
 	int			setno;
 	int			total_npartitions = 0;
 
-	if (aggstate->hash_spills != NULL)
+	if (aggstate->spills != NULL)
 	{
 		for (setno = 0; setno < aggstate->num_hashes; setno++)
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 
 			total_npartitions += spill->npartitions;
-			hashagg_spill_finish(aggstate, spill, setno);
+			agg_spill_finish(aggstate, spill, setno);
 		}
 
 		/*
@@ -3182,21 +3343,21 @@ hashagg_finish_initial_spills(AggState *aggstate)
 		 * processing batches of spilled tuples. The initial spill structures
 		 * are no longer needed.
 		 */
-		pfree(aggstate->hash_spills);
-		aggstate->hash_spills = NULL;
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
 	}
 
 	hash_agg_update_metrics(aggstate, false, total_npartitions);
-	aggstate->hash_spill_mode = false;
+	aggstate->spill_mode = false;
 }
 
 /*
- * hashagg_spill_finish
+ * agg_spill_finish
  *
  * Transform spill partitions into new batches.
  */
 static void
-hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
+agg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
 {
 	int			i;
 	int			used_bits = 32 - spill->shift;
@@ -3223,8 +3384,8 @@ hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
 		new_batch = hashagg_batch_new(tape, setno,
 									  spill->ntuples[i], cardinality,
 									  used_bits);
-		aggstate->hash_batches = lappend(aggstate->hash_batches, new_batch);
-		aggstate->hash_batches_used++;
+		aggstate->spill_batches = lappend(aggstate->spill_batches, new_batch);
+		aggstate->spill_batches_used++;
 	}
 
 	pfree(spill->ntuples);
@@ -3239,33 +3400,670 @@ static void
 hashagg_reset_spill_state(AggState *aggstate)
 {
 	/* free spills from initial pass */
-	if (aggstate->hash_spills != NULL)
+	if (aggstate->spills != NULL)
 	{
 		int			setno;
 
 		for (setno = 0; setno < aggstate->num_hashes; setno++)
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 
 			pfree(spill->ntuples);
 			pfree(spill->partitions);
 		}
-		pfree(aggstate->hash_spills);
-		aggstate->hash_spills = NULL;
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
 	}
 
 	/* free batches */
-	list_free_deep(aggstate->hash_batches);
-	aggstate->hash_batches = NIL;
+	list_free_deep(aggstate->spill_batches);
+	aggstate->spill_batches = NIL;
 
 	/* close tape set */
-	if (aggstate->hash_tapeset != NULL)
+	if (aggstate->spill_tapeset != NULL)
 	{
-		LogicalTapeSetClose(aggstate->hash_tapeset);
-		aggstate->hash_tapeset = NULL;
+		LogicalTapeSetClose(aggstate->spill_tapeset);
+		aggstate->spill_tapeset = NULL;
 	}
 }
+static void
+agg_fill_index(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	ExprContext *tmpcontext = aggstate->tmpcontext;
+	
+	/*
+	 * Process each outer-plan tuple, and then fetch the next one, until we
+	 * exhaust the outer plan.
+	 */
+	for (;;)
+	{
+		TupleTableSlot *outerslot;
+
+		outerslot = fetch_input_tuple(aggstate);
+		if (TupIsNull(outerslot))
+			break;
+
+		/* set up for lookup_index_entries and advance_aggregates */
+		tmpcontext->ecxt_outertuple = outerslot;
 
+		/* insert input tuple to index possibly spilling index to disk */
+		lookup_index_entries(aggstate);
+
+		/* Advance the aggregates (or combine functions) */
+		advance_aggregates(aggstate);
+
+		/*
+		 * Reset per-input-tuple context after each tuple, but note that the
+		 * hash lookups do this too
+		 */
+		ResetExprContext(aggstate->tmpcontext);
+	}
+
+	/* 
+	 * Mark that index filled here, so during after recompilation
+	 * expr will expect MinimalTuple instead of outer plan's one type.
+	 */
+	aggstate->index_filled = true;
+
+	indexagg_finish_initial_spills(aggstate);
+
+	/* 
+	 * This is useful only when there is no spill occurred and projecting
+	 * occurs in memory, but still initialize it.
+	 */
+	select_current_set(aggstate, 0, GROUPING_STRATEGY_INDEX);
+	InitTupleIndexIterator(perindex->index, &perindex->iter);
+}
+
+/* 
+ * Extract the attributes that make up the grouping key into the
+ * indexslot. This is necessary to perform comparison in index.
+ */
+static void
+prepare_index_slot(AggStatePerIndex perindex,
+				   TupleTableSlot *inputslot,
+				   TupleTableSlot *indexslot)
+{
+	slot_getsomeattrs(inputslot, perindex->largestGrpColIdx);
+	ExecClearTuple(indexslot);
+	
+	for (int i = 0; i < perindex->numCols; ++i)
+	{
+		int		varNumber = perindex->idxKeyColIdxInput[i] - 1;
+		indexslot->tts_values[i] = inputslot->tts_values[varNumber];
+		indexslot->tts_isnull[i] = inputslot->tts_isnull[varNumber];
+	}
+	ExecStoreVirtualTuple(indexslot);
+}
+
+static void
+indexagg_reset_spill_state(AggState *aggstate)
+{
+	/* free spills from initial pass */
+	if (aggstate->spills != NULL)
+	{
+		HashAggSpill *spill = &aggstate->spills[0];
+		pfree(spill->ntuples);
+		pfree(spill->partitions);
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
+	}
+
+	/* free batches */
+	list_free_deep(aggstate->spill_batches);
+	aggstate->spill_batches = NIL;
+
+	/* close tape set */
+	if (aggstate->spill_tapeset != NULL)
+	{
+		LogicalTapeSetClose(aggstate->spill_tapeset);
+		aggstate->spill_tapeset = NULL;
+	}
+}
+
+/* 
+ * Initialize a freshly-created MinimalTuple in index
+ */
+static void
+initialize_index_entry(AggState *aggstate, TupleIndex index, TupleIndexEntry entry)
+{
+	AggStatePerGroup pergroup;
+
+	aggstate->spill_ngroups_current++;
+	index_agg_check_limits(aggstate);
+
+	/* no need to allocate or initialize per-group state */
+	if (aggstate->numtrans == 0)
+		return;		
+
+	pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(index, entry);
+	
+	/* 
+	 * Initialize aggregates for new tuple group, indexagg_lookup_entries()
+	 * already has selected the relevant grouping set.
+	 */
+	for (int transno = 0; transno < aggstate->numtrans; ++transno)
+	{
+		AggStatePerTrans pertrans = &aggstate->pertrans[transno];
+		AggStatePerGroup pergroupstate = &pergroup[transno];
+		
+		initialize_aggregate(aggstate, pertrans, pergroupstate);
+	}
+}
+
+/* 
+ * Create new sorted run from current in-memory stored index.
+ */
+static void
+indexagg_save_index_run(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	ExprContext *econtext;
+	TupleIndexIteratorData iter;
+	AggStatePerAgg peragg;
+	TupleTableSlot *firstSlot;
+	TupleIndexEntry entry;
+	TupleTableSlot *indexslot;
+	AggStatePerGroup pergroup;
+	
+	econtext = aggstate->ss.ps.ps_ExprContext;
+	firstSlot = aggstate->ss.ss_ScanTupleSlot;
+	peragg = aggstate->peragg;
+	indexslot = perindex->indexslot;
+
+	InitTupleIndexIterator(perindex->index, &iter);
+	
+	tuplemerge_start_run(aggstate->mergestate);
+
+	while ((entry = TupleIndexIteratorNext(&iter)) != NULL)
+	{
+		MinimalTuple tuple = TupleIndexEntryGetMinimalTuple(entry);
+		TupleTableSlot *output;
+
+		ResetExprContext(econtext);
+		ExecStoreMinimalTuple(tuple, indexslot, false);
+		slot_getallattrs(indexslot);
+		
+		ExecClearTuple(firstSlot);
+		memset(firstSlot->tts_isnull, true,
+			   firstSlot->tts_tupleDescriptor->natts * sizeof(bool));
+
+		for (int i = 0; i < perindex->numCols; i++)
+		{
+			int varNumber = perindex->idxKeyColIdxInput[i] - 1;
+
+			firstSlot->tts_values[varNumber] = indexslot->tts_values[i];
+			firstSlot->tts_isnull[varNumber] = indexslot->tts_isnull[i];
+		}
+		ExecStoreVirtualTuple(firstSlot);
+
+		pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(perindex->index, entry);
+
+		econtext->ecxt_outertuple = firstSlot;
+		prepare_projection_slot(aggstate,
+								econtext->ecxt_outertuple,
+								aggstate->current_set);
+		finalize_aggregates(aggstate, peragg, pergroup);
+		output = project_aggregates(aggstate);
+		if (output)
+			tuplemerge_puttupleslot(aggstate->mergestate, output);
+	}
+
+	tuplemerge_end_run(aggstate->mergestate);
+}
+
+/* 
+ * Fill in index with tuples in given batch.
+ */
+static void
+indexagg_refill_batch(AggState *aggstate, HashAggBatch *batch)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	TupleTableSlot *spillslot = aggstate->spill_rslot;
+	TupleTableSlot *indexslot = perindex->indexslot;
+	TupleIndex index = perindex->index;
+	LogicalTapeSet *tapeset = aggstate->spill_tapeset;
+	HashAggSpill spill;
+	bool	spill_initialized = false;
+	int nspill = 0;
+	
+	agg_set_limits(aggstate->hashentrysize, batch->input_card, batch->used_bits,
+				   &aggstate->spill_mem_limit, &aggstate->spill_ngroups_limit, NULL);
+
+	ReScanExprContext(aggstate->indexcontext);
+
+	MemoryContextReset(aggstate->index_entrycxt);
+	MemoryContextReset(aggstate->index_nodecxt);
+	ResetTupleIndex(perindex->index);
+
+	aggstate->spill_ngroups_current = 0;
+
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_INDEX);
+
+	agg_recompile_expressions(aggstate, true, true);
+
+	for (;;)
+	{
+		MinimalTuple tuple;
+		TupleIndexEntry entry;
+		bool		isnew = false;
+		bool	   *p_isnew;
+		uint32		hash;
+
+		CHECK_FOR_INTERRUPTS();
+		
+		tuple = hashagg_batch_read(batch, &hash);
+		if (tuple == NULL)
+			break;
+
+		ExecStoreMinimalTuple(tuple, spillslot, true);
+		aggstate->tmpcontext->ecxt_outertuple = spillslot;
+
+		prepare_index_slot(perindex, spillslot, indexslot);
+
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
+		entry = TupleIndexLookup(index, indexslot, p_isnew);
+
+		if (entry != NULL)
+		{
+			if (isnew)
+				initialize_index_entry(aggstate, index, entry);
+
+			aggstate->all_pergroups[batch->setno] = TupleIndexEntryGetAdditional(index, entry);
+			advance_aggregates(aggstate);
+		}
+		else
+		{
+			if (!spill_initialized)
+			{
+				spill_initialized = true;
+				agg_spill_init(&spill, tapeset, batch->used_bits,
+							   batch->input_card, aggstate->hashentrysize);
+			}
+			nspill++;
+
+			agg_spill_tuple(aggstate, &spill, spillslot, hash);
+			aggstate->all_pergroups[batch->setno] = NULL;
+		}
+		
+		ResetExprContext(aggstate->tmpcontext);
+	}
+
+	LogicalTapeClose(batch->input_tape);
+
+	if (spill_initialized)
+	{
+		agg_spill_finish(aggstate, &spill, 0);
+		index_agg_update_metrics(aggstate, true, spill.npartitions);
+	}
+	else
+		index_agg_update_metrics(aggstate, true, 0);
+
+	aggstate->spill_mode = false;
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_INDEX);
+
+	pfree(batch);
+}
+
+static void
+indexagg_finish_initial_spills(AggState *aggstate)
+{
+	HashAggSpill *spill;
+	AggStatePerIndex perindex;
+	Sort		 *sort;
+
+	if (!aggstate->spill_ever_happened)
+		return;
+
+	Assert(aggstate->spills != NULL);
+
+	spill = aggstate->spills;
+	agg_spill_finish(aggstate, aggstate->spills, 0);
+
+	index_agg_update_metrics(aggstate, false, spill->npartitions);
+	aggstate->spill_mode = false;
+
+	pfree(aggstate->spills);
+	aggstate->spills = NULL;
+
+	perindex = aggstate->perindex;
+	sort = aggstate->index_sort;
+	aggstate->mergestate = tuplemerge_begin_heap(aggstate->ss.ps.ps_ResultTupleDesc,
+												 perindex->numKeyCols,
+												 perindex->idxKeyColIdxTL,
+												 sort->sortOperators,
+												 sort->collations,
+												 sort->nullsFirst,
+												 work_mem, NULL);
+	/* 
+	 * Some data was spilled.  Index aggregate requires output to be sorted,
+	 * so now we must process all remaining spilled data and produce sorted
+	 * runs for external merge.  The first saved run is current opened index.
+	 */
+	indexagg_save_index_run(aggstate);
+
+	while (aggstate->spill_batches != NIL)
+	{
+		HashAggBatch *batch = llast(aggstate->spill_batches);
+		aggstate->spill_batches = list_delete_last(aggstate->spill_batches);
+
+		indexagg_refill_batch(aggstate, batch);
+		indexagg_save_index_run(aggstate);
+	}
+
+	tuplemerge_performmerge(aggstate->mergestate);
+}
+
+static uint32
+index_calculate_input_slot_hash(AggState *aggstate,
+								TupleTableSlot *inputslot)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	MemoryContext oldcxt;
+	uint32 hash;
+	bool isnull;
+	
+	oldcxt = MemoryContextSwitchTo(aggstate->tmpcontext->ecxt_per_tuple_memory);
+	
+	perindex->exprcontext->ecxt_innertuple = inputslot;
+	hash = DatumGetUInt32(ExecEvalExpr(perindex->indexhashexpr,
+									   perindex->exprcontext,
+									   &isnull));
+
+	MemoryContextSwitchTo(oldcxt);
+
+	return hash;
+}
+
+/* 
+ * indexagg_lookup_entries
+ * 
+ * Insert input tuples to in-memory index.
+ */
+static void
+lookup_index_entries(AggState *aggstate)
+{
+	int numGroupingSets = Max(aggstate->maxsets, 1);
+	AggStatePerGroup *pergroup = aggstate->all_pergroups;
+	TupleTableSlot *outerslot = aggstate->tmpcontext->ecxt_outertuple;
+
+	for (int setno = 0; setno < numGroupingSets; ++setno)
+	{
+		AggStatePerIndex	perindex = &aggstate->perindex[setno];
+		TupleIndex		index = perindex->index;
+		TupleTableSlot *indexslot = perindex->indexslot;
+		TupleIndexEntry	entry;
+		bool			isnew = false;
+		bool		   *p_isnew;
+
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_INDEX);
+
+		prepare_index_slot(perindex, outerslot, indexslot);
+
+		/* Lookup entry in btree */
+		entry = TupleIndexLookup(perindex->index, indexslot, p_isnew);
+
+		/* For now everything is stored in memory - no disk spills */
+		if (entry != NULL)
+		{
+			/* Initialize it's trans state if just created */
+			if (isnew)
+				initialize_index_entry(aggstate, index, entry);
+
+			pergroup[setno] = TupleIndexEntryGetAdditional(index, entry);
+		}
+		else
+		{
+			HashAggSpill *spill = &aggstate->spills[setno];
+			uint32 hash;
+			
+			if (spill->partitions == NULL)
+			{
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
+							   perindex->aggnode->numGroups,
+							   aggstate->hashentrysize);
+			}
+
+			hash = index_calculate_input_slot_hash(aggstate, indexslot);
+			agg_spill_tuple(aggstate, spill, outerslot, hash);
+			pergroup[setno] = NULL;
+		}
+	}
+}
+
+static TupleTableSlot *
+agg_retrieve_index_in_memory(AggState *aggstate)
+{
+	ExprContext *econtext;
+	TupleTableSlot *firstSlot;
+	AggStatePerIndex perindex;
+	AggStatePerAgg peragg;
+	AggStatePerGroup pergroup;
+	TupleTableSlot *result;
+	
+	econtext = aggstate->ss.ps.ps_ExprContext;
+	firstSlot = aggstate->ss.ss_ScanTupleSlot;
+	peragg = aggstate->peragg;
+	perindex = &aggstate->perindex[aggstate->current_set];
+
+	for (;;)
+	{
+		TupleIndexEntry entry;
+		TupleTableSlot *indexslot = perindex->indexslot;
+
+		CHECK_FOR_INTERRUPTS();
+		
+		entry = TupleIndexIteratorNext(&perindex->iter);
+		if (entry == NULL)
+			return NULL;
+
+		ResetExprContext(econtext);
+		ExecStoreMinimalTuple(TupleIndexEntryGetMinimalTuple(entry), indexslot, false);
+		slot_getallattrs(indexslot);
+		
+		ExecClearTuple(firstSlot);
+		memset(firstSlot->tts_isnull, true,
+			   firstSlot->tts_tupleDescriptor->natts * sizeof(bool));
+
+		for (int i = 0; i < perindex->numCols; i++)
+		{
+			int varNumber = perindex->idxKeyColIdxInput[i] - 1;
+
+			firstSlot->tts_values[varNumber] = indexslot->tts_values[i];
+			firstSlot->tts_isnull[varNumber] = indexslot->tts_isnull[i];
+		}
+		ExecStoreVirtualTuple(firstSlot);
+		
+		pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(perindex->index, entry);
+		
+		econtext->ecxt_outertuple = firstSlot;
+		prepare_projection_slot(aggstate,
+								econtext->ecxt_outertuple,
+								aggstate->current_set);
+		finalize_aggregates(aggstate, peragg, pergroup);
+		result = project_aggregates(aggstate);
+		if (result)
+			return result;
+	}
+	
+	/* no more groups */
+	return NULL;
+}
+
+static TupleTableSlot *
+agg_retrieve_index_merge(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	TupleTableSlot *slot = perindex->mergeslot;
+	TupleTableSlot *resultslot = aggstate->ss.ps.ps_ResultTupleSlot;
+	
+	ExecClearTuple(slot);
+	
+	if (!tuplesort_gettupleslot(aggstate->mergestate, true, true, slot, NULL))
+		return NULL;
+
+	slot_getallattrs(slot);
+	ExecClearTuple(resultslot);
+	
+	for (int i = 0; i < resultslot->tts_tupleDescriptor->natts; ++i)
+	{
+		resultslot->tts_values[i] = slot->tts_values[i];
+		resultslot->tts_isnull[i] = slot->tts_isnull[i];
+	}
+	ExecStoreVirtualTuple(resultslot);
+
+	return resultslot;
+}
+
+static TupleTableSlot *
+agg_retrieve_index(AggState *aggstate)
+{
+	if (aggstate->spill_ever_happened)
+		return agg_retrieve_index_merge(aggstate);
+	else
+		return agg_retrieve_index_in_memory(aggstate);
+}
+
+static void
+build_index(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	MemoryContext metacxt = aggstate->index_metacxt;
+	MemoryContext entrycxt = aggstate->index_entrycxt;
+	MemoryContext nodecxt = aggstate->index_nodecxt;
+	MemoryContext oldcxt;
+	Size	additionalsize;
+	Oid	   *eqfuncoids;
+	Sort   *sort;
+
+	Assert(aggstate->aggstrategy == AGG_INDEX);
+
+	additionalsize = aggstate->numtrans * sizeof(AggStatePerGroupData);
+	sort = aggstate->index_sort;
+
+	/* inmem index */
+	perindex->index = BuildTupleIndex(perindex->indexslot->tts_tupleDescriptor,
+									  perindex->numKeyCols,
+									  perindex->idxKeyColIdxIndex,
+									  sort->sortOperators,
+									  sort->collations,
+									  sort->nullsFirst,
+									  additionalsize,
+									  metacxt,
+									  entrycxt,
+									  nodecxt);
+
+	/* disk spill logic */
+	oldcxt = MemoryContextSwitchTo(metacxt);
+	execTuplesHashPrepare(perindex->numKeyCols, perindex->aggnode->grpOperators,
+						  &eqfuncoids, &perindex->hashfunctions);
+	perindex->indexhashexpr =
+		ExecBuildHash32FromAttrs(perindex->indexslot->tts_tupleDescriptor,
+								 perindex->indexslot->tts_ops,
+								 perindex->hashfunctions,
+								 perindex->aggnode->grpCollations,
+								 perindex->numKeyCols,
+								 perindex->idxKeyColIdxIndex,
+								 &aggstate->ss.ps,
+								 0);
+	perindex->exprcontext = CreateStandaloneExprContext();
+	MemoryContextSwitchTo(oldcxt);
+}
+
+static void
+find_index_columns(AggState *aggstate)
+{
+	Bitmapset  *base_colnos;
+	Bitmapset  *aggregated_colnos;
+	TupleDesc	scanDesc = aggstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	List	   *outerTlist = outerPlanState(aggstate)->plan->targetlist;
+	EState	   *estate = aggstate->ss.ps.state;
+	AggStatePerIndex perindex;
+	Bitmapset  *colnos;
+	AttrNumber *sortColIdx;
+	List	   *indexTlist = NIL;
+	TupleDesc   indexDesc;
+	int			maxCols;
+	int			i;
+
+	find_cols(aggstate, &aggregated_colnos, &base_colnos);
+	aggstate->colnos_needed = bms_union(base_colnos, aggregated_colnos);
+	aggstate->max_colno_needed = 0;
+	aggstate->all_cols_needed = true;
+
+	for (i = 0; i < scanDesc->natts; i++)
+	{
+		int		colno = i + 1;
+
+		if (bms_is_member(colno, aggstate->colnos_needed))
+			aggstate->max_colno_needed = colno;
+		else
+			aggstate->all_cols_needed = false;
+	}
+
+	perindex = aggstate->perindex;
+	colnos = bms_copy(base_colnos);
+
+	if (aggstate->phases[0].grouped_cols)
+	{
+		Bitmapset *grouped_cols = aggstate->phases[0].grouped_cols[0];
+		ListCell  *lc;
+		foreach(lc, aggstate->all_grouped_cols)
+		{
+			int attnum = lfirst_int(lc);
+			if (!bms_is_member(attnum, grouped_cols))
+				colnos = bms_del_member(colnos, attnum);
+		}
+	}
+
+	maxCols = bms_num_members(colnos) + perindex->numKeyCols;
+
+	perindex->idxKeyColIdxInput = palloc(maxCols * sizeof(AttrNumber));
+	perindex->idxKeyColIdxIndex = palloc(perindex->numKeyCols * sizeof(AttrNumber));
+
+	/* Add all the sorting/grouping columns to colnos */
+	sortColIdx = aggstate->index_sort->sortColIdx;
+	for (i = 0; i < perindex->numKeyCols; i++)
+		colnos = bms_add_member(colnos, sortColIdx[i]);
+	
+	for (i = 0; i < perindex->numKeyCols; i++)
+	{
+		perindex->idxKeyColIdxInput[i] = sortColIdx[i];
+		perindex->idxKeyColIdxIndex[i] = i + 1;
+
+		perindex->numCols++;
+		/* delete already mapped columns */
+		colnos = bms_del_member(colnos, sortColIdx[i]);
+	}
+	
+	/* and the remainig columns */
+	i = -1;
+	while ((i = bms_next_member(colnos, i)) >= 0)
+	{
+		perindex->idxKeyColIdxInput[perindex->numCols] = i;
+		perindex->numCols++;
+	}
+
+	/* build tuple descriptor for the index */
+	perindex->largestGrpColIdx = 0;
+	for (i = 0; i < perindex->numCols; i++)
+	{
+		int		varNumber = perindex->idxKeyColIdxInput[i] - 1;
+		
+		indexTlist = lappend(indexTlist, list_nth(outerTlist, varNumber));
+		perindex->largestGrpColIdx = Max(varNumber + 1, perindex->largestGrpColIdx);
+	}
+
+	indexDesc = ExecTypeFromTL(indexTlist);
+	perindex->indexslot = ExecAllocTableSlot(&estate->es_tupleTable, indexDesc,
+										   &TTSOpsMinimalTuple);
+	list_free(indexTlist);
+	bms_free(colnos);
+
+	bms_free(base_colnos);
+}
 
 /* -----------------
  * ExecInitAgg
@@ -3297,10 +4095,12 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	int			numGroupingSets = 1;
 	int			numPhases;
 	int			numHashes;
+	int			numIndexes;
 	int			i = 0;
 	int			j = 0;
 	bool		use_hashing = (node->aggstrategy == AGG_HASHED ||
 							   node->aggstrategy == AGG_MIXED);
+	bool		use_index = (node->aggstrategy == AGG_INDEX);
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -3337,6 +4137,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	 */
 	numPhases = (use_hashing ? 1 : 2);
 	numHashes = (use_hashing ? 1 : 0);
+	numIndexes = (use_index ? 1 : 0);
 
 	/*
 	 * Calculate the maximum number of grouping sets in any phase; this
@@ -3356,7 +4157,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 
 			/*
 			 * additional AGG_HASHED aggs become part of phase 0, but all
-			 * others add an extra phase.
+			 * others add an extra phase.  AGG_INDEX does not support grouping
+			 * sets, so else branch must be AGG_SORTED or AGG_MIXED.
 			 */
 			if (agg->aggstrategy != AGG_HASHED)
 				++numPhases;
@@ -3395,6 +4197,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 
 	if (use_hashing)
 		hash_create_memory(aggstate);
+	else if (use_index)
+		index_create_memory(aggstate);
 
 	ExecAssignExprContext(estate, &aggstate->ss.ps);
 
@@ -3501,6 +4305,13 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		aggstate->phases[0].gset_lengths = palloc_array(int, numHashes);
 		aggstate->phases[0].grouped_cols = palloc_array(Bitmapset *, numHashes);
 	}
+	else if (numIndexes)
+	{
+		aggstate->perindex = palloc0(sizeof(AggStatePerIndexData) * numIndexes);
+		aggstate->phases[0].numsets = 0;
+		aggstate->phases[0].gset_lengths = palloc(numIndexes * sizeof(int));
+		aggstate->phases[0].grouped_cols = palloc(numIndexes * sizeof(Bitmapset *));
+	}
 
 	phase = 0;
 	for (phaseidx = 0; phaseidx <= list_length(node->chain); ++phaseidx)
@@ -3513,6 +4324,18 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			aggnode = list_nth_node(Agg, node->chain, phaseidx - 1);
 			sortnode = castNode(Sort, outerPlan(aggnode));
 		}
+		else if (use_index)
+		{
+			Assert(list_length(node->chain) == 1);
+
+			aggnode = node;
+			sortnode = castNode(Sort, linitial(node->chain));
+			/* 
+			 * list contains single element, so we must adjust loop variable,
+			 * so it will be single iteration at all.
+			 */
+			phaseidx++;
+		}
 		else
 		{
 			aggnode = node;
@@ -3549,6 +4372,35 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			all_grouped_cols = bms_add_members(all_grouped_cols, cols);
 			continue;
 		}
+		else if (aggnode->aggstrategy == AGG_INDEX)
+		{
+			AggStatePerPhase phasedata = &aggstate->phases[0];
+			AggStatePerIndex perindex;
+			Bitmapset *cols;
+			
+			Assert(phase == 0);
+			Assert(sortnode);
+
+			i = phasedata->numsets++;
+			
+			/* phase 0 always points to the "real" Agg in the index case */
+			phasedata->aggnode = node;
+			phasedata->aggstrategy = node->aggstrategy;
+			phasedata->sortnode = sortnode;
+
+			perindex = &aggstate->perindex[i];
+			perindex->aggnode = aggnode;
+			aggstate->index_sort = sortnode;
+
+			phasedata->gset_lengths[i] = perindex->numKeyCols = aggnode->numCols;
+
+			cols = NULL;
+			for (j = 0; j < aggnode->numCols; ++j)
+				cols = bms_add_member(cols, aggnode->grpColIdx[j]);
+				
+			phasedata->grouped_cols[i] = cols;
+			all_grouped_cols = bms_add_members(all_grouped_cols, cols);
+		}
 		else
 		{
 			AggStatePerPhase phasedata = &aggstate->phases[++phase];
@@ -3666,7 +4518,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	aggstate->all_pergroups = palloc0_array(AggStatePerGroup, numGroupingSets + numHashes);
 	pergroups = aggstate->all_pergroups;
 
-	if (node->aggstrategy != AGG_HASHED)
+	if (node->aggstrategy != AGG_HASHED && node->aggstrategy != AGG_INDEX)
 	{
 		for (i = 0; i < numGroupingSets; i++)
 		{
@@ -3680,18 +4532,15 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	/*
 	 * Hashing can only appear in the initial phase.
 	 */
-	if (use_hashing)
+	if (use_hashing || use_index)
 	{
 		Plan	   *outerplan = outerPlan(node);
 		double		totalGroups = 0;
 
-		aggstate->hash_spill_rslot = ExecInitExtraTupleSlot(estate, scanDesc,
-															&TTSOpsMinimalTuple);
-		aggstate->hash_spill_wslot = ExecInitExtraTupleSlot(estate, scanDesc,
-															&TTSOpsVirtual);
-
-		/* this is an array of pointers, not structures */
-		aggstate->hash_pergroup = pergroups;
+		aggstate->spill_rslot = ExecInitExtraTupleSlot(estate, scanDesc,
+													   &TTSOpsMinimalTuple);
+		aggstate->spill_wslot = ExecInitExtraTupleSlot(estate, scanDesc,
+													   &TTSOpsVirtual);
 
 		aggstate->hashentrysize = hash_agg_entry_size(aggstate->numtrans,
 													  outerplan->plan_width,
@@ -3706,20 +4555,115 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		for (int k = 0; k < aggstate->num_hashes; k++)
 			totalGroups += aggstate->perhash[k].aggnode->numGroups;
 
-		hash_agg_set_limits(aggstate->hashentrysize, totalGroups, 0,
-							&aggstate->hash_mem_limit,
-							&aggstate->hash_ngroups_limit,
-							&aggstate->hash_planned_partitions);
-		find_hash_columns(aggstate);
+		agg_set_limits(aggstate->hashentrysize, totalGroups, 0,
+					   &aggstate->spill_mem_limit,
+					   &aggstate->spill_ngroups_limit,
+					   &aggstate->spill_planned_partitions);
 
-		/* Skip massive memory allocation if we are just doing EXPLAIN */
-		if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-			build_hash_tables(aggstate);
+		if (use_hashing)
+		{
+			/* this is an array of pointers, not structures */
+			aggstate->hash_pergroup = pergroups;
+	
+			find_hash_columns(aggstate);
+
+			/* Skip massive memory allocation if we are just doing EXPLAIN */
+			if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+				build_hash_tables(aggstate);
+			aggstate->table_filled = false;
+		}
+		else
+		{
+			find_index_columns(aggstate);
+
+			if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+				build_index(aggstate);
+			aggstate->index_filled = false;
+		}
 
-		aggstate->table_filled = false;
 
 		/* Initialize this to 1, meaning nothing spilled, yet */
-		aggstate->hash_batches_used = 1;
+		aggstate->spill_batches_used = 1;
+	}
+
+	/* 
+	 * For index merge disk spill may be required and we perform external
+	 * merge for this purpose. But stored tuples are already projected, so
+	 * have different TupleDesc than used in-memory (inputDesc and indexDesc).
+	 */
+	if (use_index)
+	{
+		AggStatePerIndex perindex = aggstate->perindex;
+		ListCell *lc;
+		List *targetlist = aggstate->ss.ps.plan->targetlist;
+		AttrNumber *attr_mapping_tl = 
+						palloc0(sizeof(AttrNumber) * list_length(targetlist));
+		AttrNumber *keyColIdxResult;
+
+		/* 
+		 * Build grouping column attribute mapping and store it in
+		 * attr_mapping_tl.  If there is no such mapping (projected), then
+		 * InvalidAttrNumber is set, otherwise index in indexDesc column
+		 * storing this attribute.
+		 */
+		foreach (lc, targetlist)
+		{
+			TargetEntry *te = (TargetEntry *)lfirst(lc);
+			Var *group_var;
+
+			/* All grouping expressions in targetlist stored as OUTER Vars */
+			if (!IsA(te->expr, Var))
+				continue;
+			
+			group_var = (Var *)te->expr;
+			if (group_var->varno != OUTER_VAR)
+				continue;
+
+			attr_mapping_tl[foreach_current_index(lc)] = group_var->varattno;
+		}
+
+		/* Mapping is built and now create reverse mapping */
+		keyColIdxResult = palloc0(sizeof(AttrNumber) * list_length(outerPlan(node)->targetlist));
+		for (i = 0; i < list_length(targetlist); ++i)
+		{
+			AttrNumber outer_attno = attr_mapping_tl[i];
+			AttrNumber existingIdx;
+
+			if (!AttributeNumberIsValid(outer_attno))
+				continue;
+
+			existingIdx = keyColIdxResult[outer_attno - 1];
+			
+			/* attnumbers can duplicate, so use first ones */
+			if (AttributeNumberIsValid(existingIdx) && existingIdx <= outer_attno)
+				continue;
+
+			/* 
+			 * column can be referenced in query but planner can decide to
+			 * remove is from grouping.
+			 */
+			if (!bms_is_member(outer_attno, all_grouped_cols))
+				continue;
+
+			keyColIdxResult[outer_attno - 1] = i + 1;
+		}
+
+		perindex->idxKeyColIdxTL = palloc(sizeof(AttrNumber) * perindex->numKeyCols);
+		for (i = 0; i < perindex->numKeyCols; ++i)
+		{
+			AttrNumber attno = keyColIdxResult[perindex->idxKeyColIdxInput[i] - 1];
+			if (!AttributeNumberIsValid(attno))
+				elog(ERROR, "could not locate group by attributes in targetlist for index mapping");
+
+			perindex->idxKeyColIdxTL[i] = attno;
+		}
+
+		pfree(attr_mapping_tl);
+		pfree(keyColIdxResult);
+
+		perindex->mergeslot = ExecInitExtraTupleSlot(estate,
+													 aggstate->ss.ps.ps_ResultTupleDesc, 
+													 &TTSOpsMinimalTuple);
 	}
 
 	/*
@@ -3732,13 +4676,19 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	{
 		aggstate->current_phase = 0;
 		initialize_phase(aggstate, 0);
-		select_current_set(aggstate, 0, true);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_HASH);
+	}
+	else if (node->aggstrategy == AGG_INDEX)
+	{
+		aggstate->current_phase = 0;
+		initialize_phase(aggstate, 0);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_INDEX);
 	}
 	else
 	{
 		aggstate->current_phase = 1;
 		initialize_phase(aggstate, 1);
-		select_current_set(aggstate, 0, false);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_SORT);
 	}
 
 	/*
@@ -4066,8 +5016,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	for (phaseidx = 0; phaseidx < aggstate->numphases; phaseidx++)
 	{
 		AggStatePerPhase phase = &aggstate->phases[phaseidx];
-		bool		dohash = false;
-		bool		dosort = false;
+		int			strategy;
 
 		/* phase 0 doesn't necessarily exist */
 		if (!phase->aggnode)
@@ -4079,8 +5028,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			 * Phase one, and only phase one, in a mixed agg performs both
 			 * sorting and aggregation.
 			 */
-			dohash = true;
-			dosort = true;
+			strategy = GROUPING_STRATEGY_HASH | GROUPING_STRATEGY_SORT;
 		}
 		else if (aggstate->aggstrategy == AGG_MIXED && phaseidx == 0)
 		{
@@ -4094,19 +5042,20 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		else if (phase->aggstrategy == AGG_PLAIN ||
 				 phase->aggstrategy == AGG_SORTED)
 		{
-			dohash = false;
-			dosort = true;
+			strategy = GROUPING_STRATEGY_SORT;
 		}
 		else if (phase->aggstrategy == AGG_HASHED)
 		{
-			dohash = true;
-			dosort = false;
+			strategy = GROUPING_STRATEGY_HASH;
+		}
+		else if (phase->aggstrategy == AGG_INDEX)
+		{
+			strategy = GROUPING_STRATEGY_INDEX;
 		}
 		else
 			Assert(false);
 
-		phase->evaltrans = ExecBuildAggTrans(aggstate, phase, dosort, dohash,
-											 false);
+		phase->evaltrans = ExecBuildAggTrans(aggstate, phase, strategy, false);
 
 		/* cache compiled expression for outer slot without NULL check */
 		phase->evaltrans_cache[0][0] = phase->evaltrans;
@@ -4409,9 +5358,9 @@ ExecEndAgg(AggState *node)
 
 		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
 		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
-		si->hash_batches_used = node->hash_batches_used;
-		si->hash_disk_used = node->hash_disk_used;
-		si->hash_mem_peak = node->hash_mem_peak;
+		si->hash_batches_used = node->spill_batches_used;
+		si->hash_disk_used = node->spill_disk_used;
+		si->hash_mem_peak = node->spill_mem_peak;
 	}
 
 	/* Make sure we have closed any open tuplesorts */
@@ -4421,7 +5370,10 @@ ExecEndAgg(AggState *node)
 	if (node->sort_out)
 		tuplesort_end(node->sort_out);
 
-	hashagg_reset_spill_state(node);
+	if (node->aggstrategy == AGG_INDEX)
+		indexagg_reset_spill_state(node);
+	else
+		hashagg_reset_spill_state(node);
 
 	/* Release hash tables too */
 	if (node->hash_metacxt != NULL)
@@ -4434,6 +5386,26 @@ ExecEndAgg(AggState *node)
 		MemoryContextDelete(node->hash_tuplescxt);
 		node->hash_tuplescxt = NULL;
 	}
+	if (node->index_metacxt != NULL)
+	{
+		MemoryContextDelete(node->index_metacxt);
+		node->index_metacxt = NULL;
+	}
+	if (node->index_entrycxt != NULL)
+	{
+		MemoryContextDelete(node->index_entrycxt);
+		node->index_entrycxt = NULL;
+	}
+	if (node->index_nodecxt != NULL)
+	{
+		MemoryContextDelete(node->index_nodecxt);
+		node->index_nodecxt = NULL;
+	}
+	if (node->mergestate)
+	{
+		tuplesort_end(node->mergestate);
+		node->mergestate = NULL;
+	}
 
 	for (transno = 0; transno < node->numtrans; transno++)
 	{
@@ -4451,6 +5423,8 @@ ExecEndAgg(AggState *node)
 		ReScanExprContext(node->aggcontexts[setno]);
 	if (node->hashcontext)
 		ReScanExprContext(node->hashcontext);
+	if (node->indexcontext)
+		ReScanExprContext(node->indexcontext);
 
 	outerPlan = outerPlanState(node);
 	ExecEndNode(outerPlan);
@@ -4486,12 +5460,27 @@ ExecReScanAgg(AggState *node)
 		 * we can just rescan the existing hash table; no need to build it
 		 * again.
 		 */
-		if (outerPlan->chgParam == NULL && !node->hash_ever_spilled &&
+		if (outerPlan->chgParam == NULL && !node->spill_ever_happened &&
 			!bms_overlap(node->ss.ps.chgParam, aggnode->aggParams))
 		{
 			ResetTupleHashIterator(node->perhash[0].hashtable,
 								   &node->perhash[0].hashiter);
-			select_current_set(node, 0, true);
+			select_current_set(node, 0, GROUPING_STRATEGY_HASH);
+			return;
+		}
+	}
+
+	if (node->aggstrategy == AGG_INDEX)
+	{
+		if (!node->index_filled)
+			return;
+
+		if (outerPlan->chgParam == NULL && !node->spill_ever_happened &&
+			!bms_overlap(node->ss.ps.chgParam, aggnode->aggParams))
+		{
+			AggStatePerIndex perindex = node->perindex;
+			ResetTupleIndexIterator(perindex->index, &perindex->iter);
+			select_current_set(node, 0, GROUPING_STRATEGY_INDEX);
 			return;
 		}
 	}
@@ -4545,9 +5534,9 @@ ExecReScanAgg(AggState *node)
 	{
 		hashagg_reset_spill_state(node);
 
-		node->hash_ever_spilled = false;
-		node->hash_spill_mode = false;
-		node->hash_ngroups_current = 0;
+		node->spill_ever_happened = false;
+		node->spill_mode = false;
+		node->spill_ngroups_current = 0;
 
 		ReScanExprContext(node->hashcontext);
 		/* Rebuild empty hash table(s) */
@@ -4555,10 +5544,33 @@ ExecReScanAgg(AggState *node)
 		node->table_filled = false;
 		/* iterator will be reset when the table is filled */
 
-		hashagg_recompile_expressions(node, false, false);
+		agg_recompile_expressions(node, false, false);
 	}
 
-	if (node->aggstrategy != AGG_HASHED)
+	if (node->aggstrategy == AGG_INDEX)
+	{
+		indexagg_reset_spill_state(node);
+
+		node->spill_ever_happened = false;
+		node->spill_mode = false;
+		node->spill_ngroups_current = 0;
+		
+		ReScanExprContext(node->indexcontext);
+		MemoryContextReset(node->index_entrycxt);
+		MemoryContextReset(node->index_nodecxt);
+
+		build_index(node);
+		node->index_filled = false;
+
+		agg_recompile_expressions(node, false, false);
+
+		if (node->mergestate)
+		{
+			tuplesort_end(node->mergestate);
+			node->mergestate = NULL;
+		}
+	}
+	else if (node->aggstrategy != AGG_HASHED)
 	{
 		/*
 		 * Reset the per-group state (in particular, mark transvalues null)
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 88ae529e843..fc349707778 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1900,6 +1900,7 @@ static void
 inittapestate(Tuplesortstate *state, int maxTapes)
 {
 	int64		tapeSpace;
+	Size		memtuplesSize;
 
 	/*
 	 * Decrease availMem to reflect the space needed for tape buffers; but
@@ -1912,7 +1913,16 @@ inittapestate(Tuplesortstate *state, int maxTapes)
 	 */
 	tapeSpace = (int64) maxTapes * TAPE_BUFFER_OVERHEAD;
 
-	if (tapeSpace + GetMemoryChunkSpace(state->memtuples) < state->allowedMem)
+	/* 
+	 * In merge state during initial run creation we do not use in-memory
+	 * tuples array and write to tapes directly.
+	 */
+	if (state->memtuples != NULL)
+		memtuplesSize = GetMemoryChunkSpace(state->memtuples);
+	else
+		memtuplesSize = 0;
+
+	if (tapeSpace + memtuplesSize < state->allowedMem)
 		USEMEM(state, tapeSpace);
 
 	/*
@@ -2031,11 +2041,14 @@ mergeruns(Tuplesortstate *state)
 
 	/*
 	 * We no longer need a large memtuples array.  (We will allocate a smaller
-	 * one for the heap later.)
+	 * one for the heap later.)  Note that in merge state this array can be NULL.
 	 */
-	FREEMEM(state, GetMemoryChunkSpace(state->memtuples));
-	pfree(state->memtuples);
-	state->memtuples = NULL;
+	if (state->memtuples)
+	{
+		FREEMEM(state, GetMemoryChunkSpace(state->memtuples));
+		pfree(state->memtuples);
+		state->memtuples = NULL;
+	}
 
 	/*
 	 * Initialize the slab allocator.  We need one slab slot per input tape,
@@ -3157,3 +3170,189 @@ ssup_datum_int32_cmp(Datum x, Datum y, SortSupport ssup)
 	else
 		return 0;
 }
+
+/* 
+ *    tuplemerge_begin_common
+ * 
+ * Create new Tuplesortstate for performing merge only. This is used when
+ * we know, that input is sorted, but stored in multiple tapes, so only
+ * have to perform merge.
+ * 
+ * Unlike tuplesort_begin_common it does not accept sortopt, because none
+ * of current options are supported by merge (random access and bounded sort).
+ */
+Tuplesortstate *
+tuplemerge_begin_common(int workMem, SortCoordinate coordinate)
+{
+	Tuplesortstate *state;
+	MemoryContext maincontext;
+	MemoryContext sortcontext;
+	MemoryContext oldcontext;
+
+	/*
+	 * Memory context surviving tuplesort_reset.  This memory context holds
+	 * data which is useful to keep while sorting multiple similar batches.
+	 */
+	maincontext = AllocSetContextCreate(CurrentMemoryContext,
+										"TupleMerge main",
+										ALLOCSET_DEFAULT_SIZES);
+
+	/*
+	 * Create a working memory context for one sort operation.  The content of
+	 * this context is deleted by tuplesort_reset.
+	 */
+	sortcontext = AllocSetContextCreate(maincontext,
+										"TupleMerge merge",
+										ALLOCSET_DEFAULT_SIZES);
+
+	/*
+	 * Make the Tuplesortstate within the per-sortstate context.  This way, we
+	 * don't need a separate pfree() operation for it at shutdown.
+	 */
+	oldcontext = MemoryContextSwitchTo(maincontext);
+
+	state = (Tuplesortstate *) palloc0(sizeof(Tuplesortstate));
+
+	if (trace_sort)
+		pg_rusage_init(&state->ru_start);
+
+	state->base.sortopt = TUPLESORT_NONE;
+	state->base.tuples = true;
+	state->abbrevNext = 10;
+
+	/*
+	 * workMem is forced to be at least 64KB, the current minimum valid value
+	 * for the work_mem GUC.  This is a defense against parallel sort callers
+	 * that divide out memory among many workers in a way that leaves each
+	 * with very little memory.
+	 */
+	state->allowedMem = Max(workMem, 64) * (int64) 1024;
+	state->base.sortcontext = sortcontext;
+	state->base.maincontext = maincontext;
+
+	/*
+	 * After all of the other non-parallel-related state, we setup all of the
+	 * state needed for each batch.
+	 */
+
+	/* 
+	 * Merging do not accept RANDOMACCESS, so only possible context is Bump,
+	 * which saves some cycles.
+	 */
+	state->base.tuplecontext = BumpContextCreate(state->base.sortcontext,
+												 "Caller tuples",
+												 ALLOCSET_DEFAULT_SIZES);
+	
+	state->status = TSS_BUILDRUNS;
+	state->bounded = false;
+	state->boundUsed = false;
+	state->availMem = state->allowedMem;
+	
+	/* 
+	 * When performing merge we do not need in-memory array for sorting.
+	 * Even if we do not use memtuples, still allocate it, but make it empty.
+	 * So if someone will invoke inappropriate function in merge mode we will
+	 * not fail.
+	 */
+	state->memtuples = NULL;
+	state->memtupcount = 0;
+	state->memtupsize = INITIAL_MEMTUPSIZE;
+	state->growmemtuples = true;
+	state->slabAllocatorUsed = false;
+
+	/*
+	 * Tape variables (inputTapes, outputTapes, etc.) will be initialized by
+	 * inittapes(), if needed.
+	 */
+	state->result_tape = NULL;	/* flag that result tape has not been formed */
+	state->tapeset = NULL;
+	
+	inittapes(state, true);
+
+	/*
+	 * Initialize parallel-related state based on coordination information
+	 * from caller
+	 */
+	if (!coordinate)
+	{
+		/* Serial sort */
+		state->shared = NULL;
+		state->worker = -1;
+		state->nParticipants = -1;
+	}
+	else if (coordinate->isWorker)
+	{
+		/* Parallel worker produces exactly one final run from all input */
+		state->shared = coordinate->sharedsort;
+		state->worker = worker_get_identifier(state);
+		state->nParticipants = -1;
+	}
+	else
+	{
+		/* Parallel leader state only used for final merge */
+		state->shared = coordinate->sharedsort;
+		state->worker = -1;
+		state->nParticipants = coordinate->nParticipants;
+		Assert(state->nParticipants >= 1);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+void
+tuplemerge_start_run(Tuplesortstate *state)
+{
+	if (state->memtupcount == 0)
+		return;
+
+	selectnewtape(state);
+	state->memtupcount = 0;
+}
+
+void
+tuplemerge_performmerge(Tuplesortstate *state)
+{
+	if (state->memtupcount == 0)
+	{
+		/* 
+		 * We have started new run, but no tuples were written. mergeruns
+		 * expects that each run have at least 1 tuple, otherwise it
+		 * will fail to even fill initial merge heap.
+		 */
+		state->nOutputRuns--;
+	}
+	else
+		state->memtupcount = 0;
+
+	mergeruns(state);
+
+	state->current = 0;
+	state->eof_reached = false;
+	state->markpos_block = 0L;
+	state->markpos_offset = 0;
+	state->markpos_eof = false;
+}
+
+void
+tuplemerge_puttuple_common(Tuplesortstate *state, SortTuple *tuple, Size tuplen)
+{
+	MemoryContext oldcxt = MemoryContextSwitchTo(state->base.sortcontext);
+
+	Assert(state->destTape);	
+	WRITETUP(state, state->destTape, tuple);
+
+	MemoryContextSwitchTo(oldcxt);
+	
+	state->memtupcount++;
+}
+
+void
+tuplemerge_end_run(Tuplesortstate *state)
+{
+	if (state->memtupcount != 0)
+	{
+		markrunend(state->destTape);
+	}
+}
diff --git a/src/backend/utils/sort/tuplesortvariants.c b/src/backend/utils/sort/tuplesortvariants.c
index 079a51c474d..5f8afa8a17a 100644
--- a/src/backend/utils/sort/tuplesortvariants.c
+++ b/src/backend/utils/sort/tuplesortvariants.c
@@ -2071,3 +2071,108 @@ readtup_datum(Tuplesortstate *state, SortTuple *stup,
 	if (base->sortopt & TUPLESORT_RANDOMACCESS) /* need trailing length word? */
 		LogicalTapeReadExact(tape, &tuplen, sizeof(tuplen));
 }
+
+Tuplesortstate *
+tuplemerge_begin_heap(TupleDesc tupDesc,
+					  int nkeys, AttrNumber *attNums,
+					  Oid *sortOperators, Oid *sortCollations,
+					  bool *nullsFirstFlags,
+					  int workMem, SortCoordinate coordinate)
+{
+	Tuplesortstate *state = tuplemerge_begin_common(workMem, coordinate);
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	MemoryContext oldcontext;
+	int			i;
+
+	oldcontext = MemoryContextSwitchTo(base->maincontext);
+
+	Assert(nkeys > 0);
+
+	if (trace_sort)
+		elog(LOG,
+			 "begin tuple merge: nkeys = %d, workMem = %d", nkeys, workMem);
+
+	base->nKeys = nkeys;
+
+	TRACE_POSTGRESQL_SORT_START(HEAP_SORT,
+								false,	/* no unique check */
+								nkeys,
+								workMem,
+								false,
+								PARALLEL_SORT(coordinate));
+
+	base->removeabbrev = removeabbrev_heap;
+	base->comparetup = comparetup_heap;
+	base->comparetup_tiebreak = comparetup_heap_tiebreak;
+	base->writetup = writetup_heap;
+	base->readtup = readtup_heap;
+	base->haveDatum1 = true;
+	base->arg = tupDesc;		/* assume we need not copy tupDesc */
+
+	/* Prepare SortSupport data for each column */
+	base->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		SortSupport sortKey = base->sortKeys + i;
+
+		Assert(attNums[i] != 0);
+		Assert(sortOperators[i] != 0);
+
+		sortKey->ssup_cxt = CurrentMemoryContext;
+		sortKey->ssup_collation = sortCollations[i];
+		sortKey->ssup_nulls_first = nullsFirstFlags[i];
+		sortKey->ssup_attno = attNums[i];
+		/* Convey if abbreviation optimization is applicable in principle */
+		sortKey->abbreviate = (i == 0 && base->haveDatum1);
+
+		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
+	}
+
+	/*
+	 * The "onlyKey" optimization cannot be used with abbreviated keys, since
+	 * tie-breaker comparisons may be required.  Typically, the optimization
+	 * is only of value to pass-by-value types anyway, whereas abbreviated
+	 * keys are typically only of value to pass-by-reference types.
+	 */
+	if (nkeys == 1 && !base->sortKeys->abbrev_converter)
+		base->onlyKey = base->sortKeys;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+void
+tuplemerge_puttupleslot(Tuplesortstate *state, TupleTableSlot *slot)
+{
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	MemoryContext oldcontext = MemoryContextSwitchTo(base->tuplecontext);
+	TupleDesc	tupDesc = (TupleDesc) base->arg;
+	SortTuple	stup;
+	MinimalTuple tuple;
+	HeapTupleData htup;
+	Size		tuplen;
+
+	/* copy the tuple into sort storage */
+	tuple = ExecCopySlotMinimalTuple(slot);
+	stup.tuple = tuple;
+	/* set up first-column key value */
+	htup.t_len = tuple->t_len + MINIMAL_TUPLE_OFFSET;
+	htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET);
+	stup.datum1 = heap_getattr(&htup,
+							   base->sortKeys[0].ssup_attno,
+							   tupDesc,
+							   &stup.isnull1);
+
+	/* GetMemoryChunkSpace is not supported for bump contexts */
+	if (TupleSortUseBumpTupleCxt(base->sortopt))
+		tuplen = MAXALIGN(tuple->t_len);
+	else
+		tuplen = GetMemoryChunkSpace(tuple);
+
+	tuplemerge_puttuple_common(state, &stup, tuplen);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 6192cc8d143..7c9efe77ab9 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -393,8 +393,16 @@ extern ExprState *ExecInitExprWithParams(Expr *node, ParamListInfo ext_params);
 extern ExprState *ExecInitQual(List *qual, PlanState *parent);
 extern ExprState *ExecInitCheck(List *qual, PlanState *parent);
 extern List *ExecInitExprList(List *nodes, PlanState *parent);
+
+/* 
+ * Which strategy to use for aggregation/grouping
+ */
+#define GROUPING_STRATEGY_SORT			1
+#define GROUPING_STRATEGY_HASH			(1 << 1)
+#define GROUPING_STRATEGY_INDEX			(1 << 2)
+
 extern ExprState *ExecBuildAggTrans(AggState *aggstate, struct AggStatePerPhaseData *phase,
-									bool doSort, bool doHash, bool nullcheck);
+									int groupStrategy, bool nullcheck);
 extern ExprState *ExecBuildHash32FromAttrs(TupleDesc desc,
 										   const TupleTableSlotOps *ops,
 										   FmgrInfo *hashfunctions,
diff --git a/src/include/executor/nodeAgg.h b/src/include/executor/nodeAgg.h
index 6c4891bbaeb..8361d000878 100644
--- a/src/include/executor/nodeAgg.h
+++ b/src/include/executor/nodeAgg.h
@@ -321,6 +321,33 @@ typedef struct AggStatePerHashData
 	Agg		   *aggnode;		/* original Agg node, for numGroups etc. */
 }			AggStatePerHashData;
 
+/* 
+ * AggStatePerIndexData - per-index state
+ *
+ * Logic is the same as for AggStatePerHashData - one of these for each
+ * grouping set.
+ */
+typedef struct AggStatePerIndexData
+{
+	TupleIndex	index;			/* current in-memory index data */
+	MemoryContext metacxt;		/* memory context containing TupleIndex */
+	MemoryContext tempctx;		/* short-lived context */
+	TupleTableSlot *indexslot; 	/* slot for loading index */
+	int			numCols;		/* total number of columns in index tuple */
+	int			numKeyCols;		/* number of key columns in index tuple */
+	int			largestGrpColIdx;	/* largest col required for comparison */
+	AttrNumber *idxKeyColIdxInput;	/* key column indices in input slot */
+	AttrNumber *idxKeyColIdxIndex;	/* key column indices in index tuples */
+	TupleIndexIteratorData iter;	/* iterator state for index */
+	Agg		   *aggnode;		/* original Agg node, for numGroups etc. */	
+
+	/* state used only for spill mode */
+	AttrNumber	*idxKeyColIdxTL;	/* key column indices in target list */
+	FmgrInfo    *hashfunctions;	/* tuple hashing function */
+	ExprState   *indexhashexpr;	/* ExprState for hashing index datatype(s) */
+	ExprContext *exprcontext;	/* expression context */
+	TupleTableSlot *mergeslot;	/* slot for loading tuple during merge */
+}			AggStatePerIndexData;
 
 extern AggState *ExecInitAgg(Agg *node, EState *estate, int eflags);
 extern void ExecEndAgg(AggState *node);
@@ -328,9 +355,9 @@ extern void ExecReScanAgg(AggState *node);
 
 extern Size hash_agg_entry_size(int numTrans, Size tupleWidth,
 								Size transitionSpace);
-extern void hash_agg_set_limits(double hashentrysize, double input_groups,
-								int used_bits, Size *mem_limit,
-								uint64 *ngroups_limit, int *num_partitions);
+extern void agg_set_limits(double hashentrysize, double input_groups,
+						   int used_bits, Size *mem_limit,
+						   uint64 *ngroups_limit, int *num_partitions);
 
 /* parallel instrumentation support */
 extern void ExecAggEstimate(AggState *node, ParallelContext *pcxt);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 99ee472b51f..3bba2359e11 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2613,6 +2613,7 @@ typedef struct AggStatePerTransData *AggStatePerTrans;
 typedef struct AggStatePerGroupData *AggStatePerGroup;
 typedef struct AggStatePerPhaseData *AggStatePerPhase;
 typedef struct AggStatePerHashData *AggStatePerHash;
+typedef struct AggStatePerIndexData *AggStatePerIndex;
 
 typedef struct AggState
 {
@@ -2628,17 +2629,18 @@ typedef struct AggState
 	AggStatePerAgg peragg;		/* per-Aggref information */
 	AggStatePerTrans pertrans;	/* per-Trans state information */
 	ExprContext *hashcontext;	/* econtexts for long-lived data (hashtable) */
+	ExprContext *indexcontext;	/* econtexts for long-lived data (index) */
 	ExprContext **aggcontexts;	/* econtexts for long-lived data (per GS) */
 	ExprContext *tmpcontext;	/* econtext for input expressions */
-#define FIELDNO_AGGSTATE_CURAGGCONTEXT 14
+#define FIELDNO_AGGSTATE_CURAGGCONTEXT 15
 	ExprContext *curaggcontext; /* currently active aggcontext */
 	AggStatePerAgg curperagg;	/* currently active aggregate, if any */
-#define FIELDNO_AGGSTATE_CURPERTRANS 16
+#define FIELDNO_AGGSTATE_CURPERTRANS 17
 	AggStatePerTrans curpertrans;	/* currently active trans state, if any */
 	bool		input_done;		/* indicates end of input */
 	bool		agg_done;		/* indicates completion of Agg scan */
 	int			projected_set;	/* The last projected grouping set */
-#define FIELDNO_AGGSTATE_CURRENT_SET 20
+#define FIELDNO_AGGSTATE_CURRENT_SET 21
 	int			current_set;	/* The current grouping set being evaluated */
 	Bitmapset  *grouped_cols;	/* grouped cols in current projection */
 	List	   *all_grouped_cols;	/* list of all grouped cols in DESC order */
@@ -2660,32 +2662,43 @@ typedef struct AggState
 	int			num_hashes;
 	MemoryContext hash_metacxt; /* memory for hash table bucket array */
 	MemoryContext hash_tuplescxt;	/* memory for hash table tuples */
-	struct LogicalTapeSet *hash_tapeset;	/* tape set for hash spill tapes */
-	struct HashAggSpill *hash_spills;	/* HashAggSpill for each grouping set,
-										 * exists only during first pass */
-	TupleTableSlot *hash_spill_rslot;	/* for reading spill files */
-	TupleTableSlot *hash_spill_wslot;	/* for writing spill files */
-	List	   *hash_batches;	/* hash batches remaining to be processed */
-	bool		hash_ever_spilled;	/* ever spilled during this execution? */
-	bool		hash_spill_mode;	/* we hit a limit during the current batch
-									 * and we must not create new groups */
-	Size		hash_mem_limit; /* limit before spilling hash table */
-	uint64		hash_ngroups_limit; /* limit before spilling hash table */
-	int			hash_planned_partitions;	/* number of partitions planned
-											 * for first pass */
-	double		hashentrysize;	/* estimate revised during execution */
-	Size		hash_mem_peak;	/* peak hash table memory usage */
-	uint64		hash_ngroups_current;	/* number of groups currently in
-										 * memory in all hash tables */
-	uint64		hash_disk_used; /* kB of disk space used */
-	int			hash_batches_used;	/* batches used during entire execution */
-
 	AggStatePerHash perhash;	/* array of per-hashtable data */
 	AggStatePerGroup *hash_pergroup;	/* grouping set indexed array of
 										 * per-group pointers */
+	/* Fields used for managing spill mode in hash and index aggs */
+	struct LogicalTapeSet *spill_tapeset;	/* tape set for hash spill tapes */
+	struct HashAggSpill *spills;	/* HashAggSpill for each grouping set,
+									 * exists only during first pass */
+	TupleTableSlot *spill_rslot;	/* for reading spill files */
+	TupleTableSlot *spill_wslot;	/* for writing spill files */
+	List	   *spill_batches;	/* hash batches remaining to be processed */
+
+	bool		spill_ever_happened;	/* ever spilled during this execution? */
+	bool		spill_mode;	/* we hit a limit during the current batch
+							 * and we must not create new groups */
+	Size		spill_mem_limit; /* limit before spilling hash table or index */
+	uint64		spill_ngroups_limit; /* limit before spilling hash table or index */
+	int			spill_planned_partitions;	/* number of partitions planned
+											 * for first pass */
+	double		hashentrysize;	/* estimate revised during execution */
+	Size		spill_mem_peak;	/* peak memory usage of hash table or index */
+	uint64		spill_ngroups_current;	/* number of groups currently in
+										 * memory in all hash tables */
+	uint64		spill_disk_used; /* kB of disk space used */
+	int			spill_batches_used;	/* batches used during entire execution */
+
+	/* these fields are used in AGG_INDEXED mode: */
+	AggStatePerIndex perindex;	/* pointer to per-index state data */
+	bool			index_filled;	/* index filled yet? */
+	MemoryContext	index_metacxt;	/* memory for index structure */
+	MemoryContext	index_nodecxt;	/* memory for index nodes */
+	MemoryContext	index_entrycxt;	/* memory for index entries */
+	Sort		   *index_sort;		/* ordering information for index */
+	Tuplesortstate *mergestate;		/* state for merging projected tuples if
+									 * spill occurred */
 
 	/* support for evaluation of agg input expressions: */
-#define FIELDNO_AGGSTATE_ALL_PERGROUPS 54
+#define FIELDNO_AGGSTATE_ALL_PERGROUPS 62
 	AggStatePerGroup *all_pergroups;	/* array of first ->pergroups, than
 										 * ->hash_pergroup */
 	SharedAggInfo *shared_info; /* one entry per worker */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fb3957e75e5..b0e2d781c01 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -365,6 +365,7 @@ typedef enum AggStrategy
 	AGG_SORTED,					/* grouped agg, input must be sorted */
 	AGG_HASHED,					/* grouped agg, use internal hashtable */
 	AGG_MIXED,					/* grouped agg, hash and sort both used */
+	AGG_INDEX,					/* grouped agg, build index for input */
 } AggStrategy;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..b19dacf5de4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1219,7 +1219,7 @@ typedef struct Agg
 	/* grouping sets to use */
 	List	   *groupingSets;
 
-	/* chained Agg/Sort nodes */
+	/* chained Agg/Sort nodes, for AGG_INDEX contains single Sort node */
 	List	   *chain;
 } Agg;
 
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 0bf55902aa1..f372c3e7e0a 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -475,6 +475,21 @@ extern GinTuple *tuplesort_getgintuple(Tuplesortstate *state, Size *len,
 									   bool forward);
 extern bool tuplesort_getdatum(Tuplesortstate *state, bool forward, bool copy,
 							   Datum *val, bool *isNull, Datum *abbrev);
-
+/* 
+* Special state for merge mode.
+*/
+extern Tuplesortstate *tuplemerge_begin_common(int workMem,
+											   SortCoordinate coordinate);
+extern Tuplesortstate *tuplemerge_begin_heap(TupleDesc tupDesc,
+											int nkeys, AttrNumber *attNums,
+											Oid *sortOperators, Oid *sortCollations,
+											bool *nullsFirstFlags,
+											int workMem, SortCoordinate coordinate);
+extern void tuplemerge_start_run(Tuplesortstate *state);
+extern void tuplemerge_end_run(Tuplesortstate *state);
+extern void tuplemerge_puttuple_common(Tuplesortstate *state, SortTuple *tuple,
+									   Size tuplen);
+extern void tuplemerge_puttupleslot(Tuplesortstate *state, TupleTableSlot *slot);
+extern void tuplemerge_performmerge(Tuplesortstate *state);
 
 #endif							/* TUPLESORT_H */
-- 
2.43.0

Sergey Soloviev

sergey.soloviev@tantorlabs.ru

about 1 month ago

In reply to: Sergey Soloviev (#5)

4 attachment(s)

Re: Introduce Index Aggregate - new GROUP BY strategy

Hi!

I have looked again at planner's code and found mistake in cost calculation:

1. There was an extra `LOG2(numGroups)` multipler that accounts height of
btree index, but actually it is extra multiplier. Now cost is calculated as
much like sort: input_tuples * (2.0 * cpu_operator_cost * numGroupCols).
2. IndexAgg requires spilling index on disk to save sort order, but code that
calculates this cost used this value without HAVING quals adjustment.

After fixing these parts, more plans started to use Index Aggregate node.
New patches have this fixed.

Also, patches contains several minor fixes of compiler warnings to which I
did not pay attention during development, but CI pipeline complained about.

---
Sergey Soloviev

TantorLabs: https://tantorlabs.com

Attachments:

v2-0001-add-in-memory-btree-tuple-index.patchtext/x-patch; charset=UTF-8; name=v2-0001-add-in-memory-btree-tuple-index.patchDownload

From e7db0d354de3bc8f4f6b7bcc4a273b15f623ba5e Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 15:25:41 +0300
Subject: [PATCH v2 1/4] add in-memory btree tuple index

This patch implements in-memory B+tree structure. It will be used as
index for special type of grouping using index.

Size of each node is set using macro. For convenience equals 2^n - 1, so
for internal nodes we effectively calculate size of each page and find
split node (exactly in the middle), and for leaf nodes we can distribute
tuples for each node uniformely (according to the newly inserted tuple).

It supports different memory contexts for tracking memory allocations.
And just like in TupleHashTable during Lookup it uses 'isnew' pointer to
prevent new tuple creation (i.e. when memory limit is reached).

Also it has key abbreviation optimization support like in tuplesort. But
some code was copied and looks exactly the same way, so it is worth
separating such logic into a separate function.
---
 src/backend/executor/execGrouping.c | 643 ++++++++++++++++++++++++++++
 src/include/executor/executor.h     |  65 +++
 src/include/nodes/execnodes.h       |  86 +++-
 3 files changed, 793 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
index 8eb4c25e1cb..c83a3f2223d 100644
--- a/src/backend/executor/execGrouping.c
+++ b/src/backend/executor/execGrouping.c
@@ -622,3 +622,646 @@ TupleHashTableMatch(struct tuplehash_hash *tb, MinimalTuple tuple1, MinimalTuple
 	econtext->ecxt_outertuple = slot1;
 	return !ExecQualAndReset(hashtable->cur_eq_func, econtext);
 }
+
+/*****************************************************************************
+ * 		Utility routines for all-in-memory btree index
+ * 
+ * These routines build btree index for grouping tuples together (eg, for
+ * index aggregation).  There is one entry for each not-distinct set of tuples
+ * presented.
+ *****************************************************************************/
+
+/* 
+ * Representation of searched entry in tuple index. This have
+ * separate representation to avoid necessary memory allocations
+ * to create MinimalTuple for TupleIndexEntry.
+ */
+typedef struct TupleIndexSearchEntryData
+{
+	TupleTableSlot *slot;		/* search TupleTableSlot */
+	Datum	key1;				/* first searched key data */
+	bool	isnull1;			/* first searched key is null */
+} TupleIndexSearchEntryData;
+
+typedef TupleIndexSearchEntryData *TupleIndexSearchEntry;
+
+/* 
+ * compare_index_tuple_tiebreak
+ * 		Perform full comparison of tuples without key abbreviation.
+ * 
+ * Invoked if first key (possibly abbreviated) can not decide comparison, so
+ * we have to compare all keys.
+ */
+static inline int
+compare_index_tuple_tiebreak(TupleIndex index, TupleIndexEntry left,
+							 TupleIndexSearchEntry right)
+{
+	HeapTupleData ltup;
+	SortSupport sortKey = index->sortKeys;
+	TupleDesc tupDesc = index->tupDesc;
+	AttrNumber	attno;
+	Datum		datum1,
+				datum2;
+	bool		isnull1,
+				isnull2;
+	int			cmp;
+
+	ltup.t_len = left->tuple->t_len + MINIMAL_TUPLE_OFFSET;
+	ltup.t_data = (HeapTupleHeader) ((char *) left->tuple - MINIMAL_TUPLE_OFFSET);
+	tupDesc = index->tupDesc;
+
+	if (sortKey->abbrev_converter)
+	{
+		attno = sortKey->ssup_attno;
+
+		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);
+		datum2 = slot_getattr(right->slot, attno, &isnull2);
+
+		cmp = ApplySortAbbrevFullComparator(datum1, isnull1,
+											datum2, isnull2,
+											sortKey);
+		if (cmp != 0)
+			return cmp;
+	}
+
+	sortKey++;
+	for (int nkey = 1; nkey < index->nkeys; nkey++, sortKey++)
+	{
+		attno = sortKey->ssup_attno;
+
+		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);
+		datum2 = slot_getattr(right->slot, attno, &isnull2);
+
+		cmp = ApplySortComparator(datum1, isnull1,
+								  datum2, isnull2,
+								  sortKey);
+		if (cmp != 0)
+			return cmp;
+	}
+	
+	return 0;
+}
+
+/* 
+ * compare_index_tuple
+ * 		Compare pair of tuples during index lookup
+ * 
+ * The comparison honors key abbreviation.
+ */
+static int
+compare_index_tuple(TupleIndex index,
+					TupleIndexEntry left,
+					TupleIndexSearchEntry right)
+{
+	SortSupport sortKey = &index->sortKeys[0];
+	int	cmp = 0;
+	
+	cmp = ApplySortComparator(left->key1, left->isnull1,
+							  right->key1, right->isnull1,
+							  sortKey);
+	if (cmp != 0)
+		return cmp;
+
+	return compare_index_tuple_tiebreak(index, left, right);
+}
+
+/* 
+ * tuple_index_node_bsearch
+ * 		Perform binary search in the index node.
+ * 
+ * On return, if 'found' is set to 'true', then exact match found and returned
+ * index is an index in tuples array.  Otherwise the value handled differently:
+ * - for internal nodes this is an index in 'pointers' array which to follow
+ * - for leaf nodes this is an index to which new entry must be inserted.
+ */
+static int
+tuple_index_node_bsearch(TupleIndex index, TupleIndexNode node,
+						 TupleIndexSearchEntry search, bool *found)
+{
+	int low;
+	int high;
+	
+	low = 0;
+	high = node->ntuples;
+	*found = false;
+
+	while (low < high)
+	{
+		OffsetNumber mid = (low + high) / 2;
+		TupleIndexEntry mid_entry = node->tuples[mid];
+		int cmp;
+
+		cmp = compare_index_tuple(index, mid_entry, search);
+		if (cmp == 0)
+		{
+			*found = true;
+			return mid;
+		}
+
+		if (cmp < 0)
+			low = mid + 1;
+		else
+			high = mid;
+	}
+
+	return low;
+}
+
+static inline TupleIndexNode
+IndexLeafNodeGetNext(TupleIndexNode node)
+{
+	return node->pointers[0];
+}
+
+static inline void
+IndexLeafNodeSetNext(TupleIndexNode node, TupleIndexNode next)
+{
+	node->pointers[0] = next;
+}
+
+#define SizeofTupleIndexInternalNode \
+	  (offsetof(TupleIndexNodeData, pointers) \
+	+ (TUPLE_INDEX_NODE_MAX_ENTRIES + 1) * sizeof(TupleIndexNode))
+
+#define SizeofTupleIndexLeafNode \
+	offsetof(TupleIndexNodeData, pointers) + sizeof(TupleIndexNode)
+
+static inline TupleIndexNode
+AllocLeafIndexNode(TupleIndex index, TupleIndexNode next)
+{
+	TupleIndexNode leaf;
+	leaf = MemoryContextAllocZero(index->nodecxt, SizeofTupleIndexLeafNode);
+	IndexLeafNodeSetNext(leaf, next);
+	return leaf;
+}
+
+static inline TupleIndexNode
+AllocInternalIndexNode(TupleIndex index)
+{
+	return MemoryContextAllocZero(index->nodecxt, SizeofTupleIndexInternalNode);
+}
+
+/* 
+ * tuple_index_node_insert_at
+ * 		Insert new tuple in the node at specified index
+ * 
+ * This function is inserted when new tuple must be inserted in the node (both
+ * leaf and internal). For internal nodes 'pointer' must be also specified.
+ *
+ * Node must have free space available. It's up to caller to check if node
+ * is full and needs splitting. For split use 'tuple_index_perform_insert_split'.
+ */
+static inline void
+tuple_index_node_insert_at(TupleIndexNode node, bool is_leaf, int idx,
+						   TupleIndexEntry entry, TupleIndexNode pointer)
+{
+	int move_count;
+
+	Assert(node->ntuples < TUPLE_INDEX_NODE_MAX_ENTRIES);
+	Assert(0 <= idx && idx <= node->ntuples);
+	move_count = node->ntuples - idx;
+
+	if (move_count > 0)
+		memmove(&node->tuples[idx + 1], &node->tuples[idx],
+			move_count * sizeof(TupleIndexEntry));
+
+	node->tuples[idx] = entry;
+
+	if (!is_leaf)
+	{
+		Assert(pointer != NULL);
+
+		if (move_count > 0)
+			memmove(&node->pointers[idx + 2], &node->pointers[idx + 1],
+					move_count * sizeof(TupleIndexNode));
+		node->pointers[idx + 1] = pointer;
+	}
+
+	node->ntuples++;
+}
+
+/* 
+ * Insert tuple to full node with page split.
+ * 
+ * 'split_node_out' - new page containing nodes on right side
+ * 'split_tuple_out' - tuple, which sent to the parent node as new separator key
+ */
+static void
+tuple_index_insert_split(TupleIndex index, TupleIndexNode node, bool is_leaf,
+						 int insert_pos, TupleIndexNode *split_node_out,
+						 TupleIndexEntry *split_entry_out)
+{
+	TupleIndexNode split;
+	int split_tuple_idx;
+
+	Assert(node->ntuples == TUPLE_INDEX_NODE_MAX_ENTRIES);
+
+	if (is_leaf)
+	{
+		/* 
+		 * Max amount of tuples is kept odd, so we need to decide at
+		 * which index to perform page split. We know that split occurred
+		 * during insert, so left less entries to the page at which
+		 * insertion must occur.
+		 */
+		if (TUPLE_INDEX_NODE_MAX_ENTRIES / 2 < insert_pos)
+			split_tuple_idx = TUPLE_INDEX_NODE_MAX_ENTRIES / 2 + 1;
+		else
+			split_tuple_idx = TUPLE_INDEX_NODE_MAX_ENTRIES / 2;
+
+		split = AllocLeafIndexNode(index, IndexLeafNodeGetNext(node));
+		split->ntuples = node->ntuples - split_tuple_idx;
+		node->ntuples = split_tuple_idx;
+		memcpy(&split->tuples[0], &node->tuples[node->ntuples], 
+			   sizeof(TupleIndexEntry) * split->ntuples);
+		IndexLeafNodeSetNext(node, split);
+	}
+	else
+	{
+		/* 
+		 * After split on internal node split tuple will be removed.
+		 * Max amount of tuples is odd, so division by 2 will handle it.
+		 */
+		split_tuple_idx = TUPLE_INDEX_NODE_MAX_ENTRIES / 2;
+		split = AllocInternalIndexNode(index);
+		split->ntuples = split_tuple_idx;
+		node->ntuples = split_tuple_idx;
+		memcpy(&split->tuples[0], &node->tuples[split_tuple_idx + 1],
+				sizeof(TupleIndexEntry) * split->ntuples);
+		memcpy(&split->pointers[0], &node->pointers[split_tuple_idx + 1],
+				sizeof(TupleIndexNode) * (split->ntuples + 1));
+	}
+
+	*split_node_out = split;
+	*split_entry_out = node->tuples[split_tuple_idx];
+}
+
+static inline Datum
+mintup_getattr(MinimalTuple tup, TupleDesc tupdesc, AttrNumber attnum, bool *isnull)
+{
+	HeapTupleData htup;
+
+	htup.t_len = tup->t_len + MINIMAL_TUPLE_OFFSET;
+	htup.t_data = (HeapTupleHeader) ((char *) tup - MINIMAL_TUPLE_OFFSET);
+
+	return heap_getattr(&htup, attnum, tupdesc, isnull);
+}
+
+static TupleIndexEntry
+tuple_index_node_lookup(TupleIndex index,
+						TupleIndexNode node, int level,
+						TupleIndexSearchEntry search, bool *is_new,
+						TupleIndexNode *split_node_out,
+						TupleIndexEntry *split_entry_out)
+{
+	TupleIndexEntry entry;
+	int idx;
+	bool found;
+	bool is_leaf;
+
+	TupleIndexNode insert_pointer;
+	TupleIndexEntry insert_entry;
+	bool need_insert;
+
+	Assert(level >= 0);
+
+	idx = tuple_index_node_bsearch(index, node, search, &found);
+	if (found)
+	{
+		/* 
+		 * Both internal and leaf nodes store pointers to elements, so we can
+		 * safely return exact match found at each level.
+		 */
+		if (is_new)
+			*is_new = false;
+		return node->tuples[idx];
+	}
+
+	is_leaf = level == 0;
+	if (is_leaf)
+	{
+		MemoryContext oldcxt;
+
+		if (is_new == NULL)
+			return NULL;
+
+		oldcxt = MemoryContextSwitchTo(index->tuplecxt);
+
+		entry = palloc(sizeof(TupleIndexEntryData));
+		entry->tuple = ExecCopySlotMinimalTupleExtra(search->slot, index->additionalsize);
+
+		MemoryContextSwitchTo(oldcxt);
+
+		/* 
+		 * key1 in search tuple stored in TableTupleSlot which have it's own
+		 * lifetime, so we must not copy it.
+		 * 
+		 * But if key abbreviation is in use than we should copy it from search
+		 * tuple: this is safe (pass-by-value) and extra recalculation can
+		 * spoil statistics calculation.
+		 */
+		if (index->sortKeys->abbrev_converter)
+		{
+			entry->isnull1 = search->isnull1;
+			entry->key1 = search->key1;
+		}
+		else
+		{
+			SortSupport sortKey = &index->sortKeys[0];
+			entry->key1 = mintup_getattr(entry->tuple, index->tupDesc,
+										 sortKey->ssup_attno, &entry->isnull1);
+		}
+
+		index->ntuples++;
+
+		*is_new = true;
+		need_insert = true;
+		insert_pointer = NULL;
+		insert_entry = entry;
+	}
+	else
+	{
+		TupleIndexNode child_split_node = NULL;
+		TupleIndexEntry child_split_entry;
+
+		entry = tuple_index_node_lookup(index, node->pointers[idx], level - 1,
+										search, is_new,
+										&child_split_node, &child_split_entry);
+		if (entry == NULL)
+			return NULL;
+
+		if (child_split_node != NULL)
+		{
+			need_insert = true;
+			insert_pointer = child_split_node;
+			insert_entry = child_split_entry;
+		}
+		else
+			need_insert = false;
+	}
+	
+	if (need_insert)
+	{
+		Assert(insert_entry != NULL);
+
+		if (node->ntuples == TUPLE_INDEX_NODE_MAX_ENTRIES)
+		{
+			TupleIndexNode split_node;
+			TupleIndexEntry split_entry;
+
+			tuple_index_insert_split(index, node, is_leaf, idx,
+									 &split_node, &split_entry);
+
+			/* adjust insertion index if tuple is inserted to the splitted page */
+			if (node->ntuples < idx)
+			{
+				/* keep split tuple for leaf nodes and remove for internal */
+				if (is_leaf)
+					idx -= node->ntuples;
+				else
+					idx -= node->ntuples + 1;
+
+				node = split_node;
+			}
+
+			*split_node_out = split_node;
+			*split_entry_out = split_entry;
+		}
+
+		Assert(idx >= 0);
+		tuple_index_node_insert_at(node, is_leaf, idx, insert_entry, insert_pointer);
+	}
+
+	return entry;
+}
+
+static void
+remove_index_abbreviations(TupleIndex index)
+{
+	TupleIndexIteratorData iter;
+	TupleIndexEntry entry;
+	SortSupport sortKey = &index->sortKeys[0];
+
+	sortKey->comparator = sortKey->abbrev_full_comparator;
+	sortKey->abbrev_converter = NULL;
+	sortKey->abbrev_abort = NULL;
+	sortKey->abbrev_full_comparator = NULL;
+
+	/* now traverse all index entries and convert all existing keys */
+	InitTupleIndexIterator(index, &iter);
+	while ((entry = TupleIndexIteratorNext(&iter)) != NULL)
+		entry->key1 = mintup_getattr(entry->tuple, index->tupDesc,
+									 sortKey->ssup_attno, &entry->isnull1);
+}
+
+static inline void
+prepare_search_index_tuple(TupleIndex index, TupleTableSlot *slot,
+						   TupleIndexSearchEntry entry)
+{
+	SortSupport	sortKey;
+
+	sortKey = &index->sortKeys[0];
+
+	entry->slot = slot;
+	entry->key1 = slot_getattr(slot, sortKey->ssup_attno, &entry->isnull1);
+
+	/* NULL can not be abbreviated */
+	if (entry->isnull1)
+		return;
+
+	/* abbreviation is not used */
+	if (!sortKey->abbrev_converter)
+		return;
+
+	/* check if abbreviation should be removed */
+	if (index->abbrevNext <= index->ntuples)
+	{
+		index->abbrevNext *= 2;
+
+		if (sortKey->abbrev_abort(index->ntuples, sortKey))
+		{
+			remove_index_abbreviations(index);
+			return;
+		}
+	}
+
+	entry->key1 = sortKey->abbrev_converter(entry->key1, sortKey);
+}
+
+TupleIndexEntry
+TupleIndexLookup(TupleIndex index, TupleTableSlot *searchslot, bool *is_new)
+{
+	TupleIndexEntry entry;
+	TupleIndexSearchEntryData search_entry;
+	TupleIndexNode split_node = NULL;
+	TupleIndexEntry split_entry;
+	TupleIndexNode new_root;
+
+	prepare_search_index_tuple(index, searchslot, &search_entry);
+
+	entry = tuple_index_node_lookup(index, index->root, index->height,
+									&search_entry, is_new, &split_node, &split_entry);
+
+	if (entry == NULL)
+		return NULL;
+
+	if (split_node == NULL)
+		return entry;
+
+	/* root split */
+	new_root = AllocInternalIndexNode(index);
+	new_root->ntuples = 1;
+	new_root->tuples[0] = split_entry;
+	new_root->pointers[0] = index->root;
+	new_root->pointers[1] = split_node;
+	index->root = new_root;
+	index->height++;
+
+	return entry;
+}
+
+void
+InitTupleIndexIterator(TupleIndex index, TupleIndexIterator iter)
+{
+	TupleIndexNode min_node;
+	int level;
+
+	/* iterate to the left-most node */
+	min_node = index->root;
+	level = index->height;
+	while (level-- > 0)
+		min_node = min_node->pointers[0];
+
+	iter->cur_leaf = min_node;
+	iter->cur_idx = 0;
+}
+
+TupleIndexEntry
+TupleIndexIteratorNext(TupleIndexIterator iter)
+{
+	TupleIndexNode leaf = iter->cur_leaf;
+	TupleIndexEntry tuple;
+
+	if (leaf == NULL)
+		return NULL;
+
+	/* this also handles single empty root node case */
+	if (leaf->ntuples <= iter->cur_idx)
+	{
+		leaf = iter->cur_leaf = IndexLeafNodeGetNext(leaf);
+		if (leaf == NULL)
+			return NULL;
+		iter->cur_idx = 0;
+	}
+
+	tuple = leaf->tuples[iter->cur_idx];
+	iter->cur_idx++;
+	return tuple;
+}
+
+/* 
+ * Construct an empty TupleIndex
+ *
+ * inputDesc: tuple descriptor for input tuples
+ * nkeys: number of columns to be compared (length of next 4 arrays)
+ * attNums: attribute numbers used for grouping in sort order
+ * sortOperators: Oids of sort operator families used for comparisons
+ * sortCollations: collations used for comparisons
+ * nullsFirstFlags: strategy for handling NULL values
+ * additionalsize: size of data that may be stored along with the index entry
+ * 				   used for storing per-trans information during aggregation
+ * metacxt: memory context for TupleIndex itself
+ * tuplecxt: memory context for storing MinimalTuples
+ * nodecxt: memory context for storing index nodes
+ */
+TupleIndex
+BuildTupleIndex(TupleDesc inputDesc,
+				int nkeys,
+				AttrNumber *attNums,
+				Oid *sortOperators,
+				Oid *sortCollations,
+				bool *nullsFirstFlags,
+				Size additionalsize,
+				MemoryContext metacxt,
+				MemoryContext tuplecxt,
+				MemoryContext nodecxt)
+{
+	TupleIndex index;
+	MemoryContext oldcxt;
+
+	Assert(nkeys > 0);
+
+	additionalsize = MAXALIGN(additionalsize);
+
+	oldcxt = MemoryContextSwitchTo(metacxt);
+
+	index = (TupleIndex) palloc(sizeof(TupleIndexData));
+	index->tuplecxt = tuplecxt;
+	index->nodecxt = nodecxt;
+	index->additionalsize = additionalsize;
+	index->tupDesc = CreateTupleDescCopy(inputDesc);
+	index->root = AllocLeafIndexNode(index, NULL);
+	index->ntuples = 0;
+	index->height = 0;
+
+	index->nkeys = nkeys;
+	index->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));
+
+	for (int i = 0; i < nkeys; ++i)
+	{
+		SortSupport sortKey = &index->sortKeys[i];
+
+		Assert(AttributeNumberIsValid(attNums[i]));
+		Assert(OidIsValid(sortOperators[i]));
+
+		sortKey->ssup_cxt = CurrentMemoryContext;
+		sortKey->ssup_collation = sortCollations[i];
+		sortKey->ssup_nulls_first = nullsFirstFlags[i];
+		sortKey->ssup_attno = attNums[i];
+		/* abbreviation applies only for the first key */
+		sortKey->abbreviate = i == 0;
+
+		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
+	}
+
+	/* Update abbreviation information */
+	if (index->sortKeys[0].abbrev_converter != NULL)
+	{
+		index->abbrevUsed = true;
+		index->abbrevNext = 10;
+		index->abbrevSortOp = sortOperators[0];
+	}
+	else
+		index->abbrevUsed = false;
+
+	MemoryContextSwitchTo(oldcxt);
+	return index;
+}
+
+/* 
+ * Resets contents of the index to be empty, preserving all the non-content
+ * state.
+ */
+void
+ResetTupleIndex(TupleIndex index)
+{
+	SortSupport ssup;
+
+	/* by this time indexcxt must be reset by the caller */
+	index->root = AllocLeafIndexNode(index, NULL);
+	index->height = 0;
+	index->ntuples = 0;
+	
+	if (!index->abbrevUsed)
+		return;
+
+	/* 
+	 * If key abbreviation is used then we must reset it's state.
+	 * All fields in SortSupport are already setup, but we should clean
+	 * some fields to make it look just if we setup this for the first time.
+	 */
+	ssup = &index->sortKeys[0];
+	ssup->comparator = NULL;
+	PrepareSortSupportFromOrderingOp(index->abbrevSortOp, ssup);
+}
+
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fa2b657fb2f..6192cc8d143 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -198,6 +198,71 @@ TupleHashEntryGetAdditional(TupleHashTable hashtable, TupleHashEntry entry)
 }
 #endif
 
+extern TupleIndex BuildTupleIndex(TupleDesc inputDesc,
+								  int nkeys,
+								  AttrNumber *attNums,
+								  Oid *sortOperators,
+								  Oid *sortCollations,
+								  bool *nullsFirstFlags,
+								  Size additionalsize,
+								  MemoryContext metacxt,
+								  MemoryContext tablecxt,
+								  MemoryContext nodecxt);
+extern TupleIndexEntry TupleIndexLookup(TupleIndex index, TupleTableSlot *search,
+		  								bool *is_new);
+extern void ResetTupleIndex(TupleIndex index);
+
+/* 
+ * Start iteration over tuples in index. Supports only ascending direction.
+ * During iterations no modifications are allowed, because it can break iterator.
+ */
+extern void	InitTupleIndexIterator(TupleIndex index, TupleIndexIterator iter);
+extern TupleIndexEntry TupleIndexIteratorNext(TupleIndexIterator iter);
+static inline void
+ResetTupleIndexIterator(TupleIndex index, TupleIndexIterator iter)
+{
+	InitTupleIndexIterator(index, iter);
+}
+
+#ifndef FRONTEND
+
+/* 
+ * Return size of the index entry. Useful for estimating memory usage.
+ */
+static inline size_t
+TupleIndexEntrySize(void)
+{
+	return sizeof(TupleIndexEntryData);
+}
+
+/* 
+ * Get a pointer to the additional space allocated for this entry. The
+ * memory will be maxaligned and zeroed.
+ * 
+ * The amount of space available is the additionalsize requested in the call
+ * to BuildTupleIndex(). If additionalsize was specified as zero, return
+ * NULL.
+ */
+static inline void *
+TupleIndexEntryGetAdditional(TupleIndex index, TupleIndexEntry entry)
+{
+if (index->additionalsize > 0)
+	return (char *) (entry->tuple) - index->additionalsize;
+else
+	return NULL;
+}
+
+/* 
+ * Return tuple from index entry
+ */
+static inline MinimalTuple
+TupleIndexEntryGetMinimalTuple(TupleIndexEntry entry)
+{
+	return entry->tuple;
+}
+
+#endif
+
 /*
  * prototypes from functions in execJunk.c
  */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 64ff6996431..99ee472b51f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -900,7 +900,91 @@ typedef tuplehash_iterator TupleHashIterator;
 #define ScanTupleHashTable(htable, iter) \
 	tuplehash_iterate(htable->hashtab, iter)
 
-
+/* ---------------------------------------------------------------
+ * 				Tuple Btree index
+	*
+	* All-in-memory tuple Btree index used for grouping and aggregating.
+	* ---------------------------------------------------------------
+	*/
+
+	/* 
+	 * Representation of tuple in index.  It stores both tuple and
+	* first key information.  If key abbreviation is used, then this
+	* first key stores abbreviated key.
+	*/
+typedef struct TupleIndexEntryData
+{
+	MinimalTuple tuple;	/* actual stored tuple */
+	Datum	key1;		/* value of first key */
+	bool	isnull1;	/* first key is null */
+} TupleIndexEntryData;
+
+typedef TupleIndexEntryData *TupleIndexEntry;
+
+/* 
+ * Btree node of tuple index. Common for both internal and leaf nodes.
+	*/
+typedef struct TupleIndexNodeData
+{
+	/* amount of tuples in the node */
+	int ntuples;
+
+/* 
+ * Maximal amount of tuples stored in tuple index node.
+	*
+	* NOTE: use 2^n - 1 count, so all all tuples will fully utilize cache lines
+	*       (except first because of 'ntuples' padding)
+	*/
+#define TUPLE_INDEX_NODE_MAX_ENTRIES  63
+
+	/* 
+	 * array of tuples for this page.
+		* 
+		* for internal node these are separator keys.
+		* for leaf nodes actual tuples.
+		*/
+	TupleIndexEntry tuples[TUPLE_INDEX_NODE_MAX_ENTRIES];
+
+	/* 
+	 * for internal nodes this is an array with size
+		* TUPLE_INDEX_NODE_MAX_ENTRIES + 1 - pointers to nodes below.
+		* 
+		* for leaf nodes this is an array of 1 element - pointer to sibling
+		* node required for iteration
+		*/
+	struct TupleIndexNodeData *pointers[FLEXIBLE_ARRAY_MEMBER];
+} TupleIndexNodeData;
+
+typedef TupleIndexNodeData *TupleIndexNode;
+
+typedef struct TupleIndexData
+{
+	TupleDesc	tupDesc;		/* descriptor for stored tuples */
+	TupleIndexNode root;		/* root of the tree */
+	int		height;				/* current tree height */
+	int		ntuples;			/* number of tuples in index */
+	int		nkeys;				/* amount of keys in tuple */
+	SortSupport	sortKeys;		/* support functions for key comparison */
+	MemoryContext	tuplecxt;	/* memory context containing tuples */
+	MemoryContext	nodecxt;	/* memory context containing index nodes */
+	Size	additionalsize;		/* size of additional data for tuple */
+	int		abbrevNext;			/* next time we should check abbreviation 
+									* optimization efficiency */
+	bool	abbrevUsed;			/* true if key abbreviation optimization
+									* was ever used */
+	Oid		abbrevSortOp;		/* sort operator for first key */
+} TupleIndexData;
+
+typedef struct TupleIndexData *TupleIndex;
+
+typedef struct TupleIndexIteratorData
+{
+	TupleIndexNode	cur_leaf;	/* current leaf node */
+	OffsetNumber	cur_idx;	/* index of tuple to return next */
+} TupleIndexIteratorData;
+
+typedef TupleIndexIteratorData *TupleIndexIterator;
+	
 /* ----------------------------------------------------------------
  *				 Expression State Nodes
  *
-- 
2.43.0

v2-0002-introduce-AGG_INDEX-grouping-strategy-node.patchtext/x-patch; charset=UTF-8; name=v2-0002-introduce-AGG_INDEX-grouping-strategy-node.patchDownload

From 2986764514f2310bfe2d1d7d2eacb4e4096e76f8 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 16:41:58 +0300
Subject: [PATCH v2 2/4] introduce AGG_INDEX grouping strategy node

AGG_INDEX is a new grouping strategy that builds in-memory index and use
it for grouping. The main advantage of this approach is that output is
ordered by grouping columns and if there are any ORDER BY specified,
then it will use this to build grouping/sorting columns.

For index it uses B+tree which was implemented in previous commit. And
overall it's implementation is very close to AGG_HASHED:

- maintain in-memory grouping structure
- track memory consuption
- if memory limit reached spill data to disk in batches (using hash of
  key columns)
- hash batches are processed one after another and for each batch fill
  new in-memory structure

For this reason many code logic is generalized to support both index and
hash implementations: function generalization using boolean arguments
(i.e. 'ishash'), rename spill logic members in AggState with prefix
'spill_' instead of 'hash_', etc.

Most differences are in spill logic: to preserve sort order in case of disk
spill we must dump all indexes to disk to create sorted runs and perform
final external merge.

One problem is external merge. It's adapted from tuplesort.c - introduce
new operational mode - tuplemerge (with it's own prefix). Internally we
just setup state accordingly and process as earlier without any
significant code changes.

Another problem is what tuples to save into sorted runs. We decided to
store tuples after projection (when it's aggregates are finalized),
because internal transition info is represented by value/isnull/novalue
tripple (in AggStatePerGroupData) which is quiet hard to serialize and
handle, but actually, after projection all group by attributes are
saved, so we can access them during merge. Also, projection applies
filter, so it can discard some tuples.
---
 src/backend/executor/execExpr.c            |   31 +-
 src/backend/executor/nodeAgg.c             | 1378 +++++++++++++++++---
 src/backend/utils/sort/tuplesort.c         |  209 ++-
 src/backend/utils/sort/tuplesortvariants.c |  105 ++
 src/include/executor/executor.h            |   10 +-
 src/include/executor/nodeAgg.h             |   33 +-
 src/include/nodes/execnodes.h              |   61 +-
 src/include/nodes/nodes.h                  |    1 +
 src/include/nodes/plannodes.h              |    2 +-
 src/include/utils/tuplesort.h              |   17 +-
 10 files changed, 1618 insertions(+), 229 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index c35744b105e..117d7ba31d0 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -94,7 +94,7 @@ static void ExecInitCoerceToDomain(ExprEvalStep *scratch, CoerceToDomain *ctest,
 static void ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 								  ExprEvalStep *scratch,
 								  FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
-								  int transno, int setno, int setoff, bool ishash,
+								  int transno, int setno, int setoff, int strategy,
 								  bool nullcheck);
 static void ExecInitJsonExpr(JsonExpr *jsexpr, ExprState *state,
 							 Datum *resv, bool *resnull,
@@ -3667,7 +3667,7 @@ ExecInitCoerceToDomain(ExprEvalStep *scratch, CoerceToDomain *ctest,
  */
 ExprState *
 ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
-				  bool doSort, bool doHash, bool nullcheck)
+				  int groupStrategy, bool nullcheck)
 {
 	ExprState  *state = makeNode(ExprState);
 	PlanState  *parent = &aggstate->ss.ps;
@@ -3925,7 +3925,7 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 		 * grouping set). Do so for both sort and hash based computations, as
 		 * applicable.
 		 */
-		if (doSort)
+		if (groupStrategy & GROUPING_STRATEGY_SORT)
 		{
 			int			processGroupingSets = Max(phase->numsets, 1);
 			int			setoff = 0;
@@ -3933,13 +3933,13 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 			for (int setno = 0; setno < processGroupingSets; setno++)
 			{
 				ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
-									  pertrans, transno, setno, setoff, false,
-									  nullcheck);
+									  pertrans, transno, setno, setoff,
+									  GROUPING_STRATEGY_SORT, nullcheck);
 				setoff++;
 			}
 		}
 
-		if (doHash)
+		if (groupStrategy & GROUPING_STRATEGY_HASH)
 		{
 			int			numHashes = aggstate->num_hashes;
 			int			setoff;
@@ -3953,12 +3953,19 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 			for (int setno = 0; setno < numHashes; setno++)
 			{
 				ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
-									  pertrans, transno, setno, setoff, true,
-									  nullcheck);
+									  pertrans, transno, setno, setoff,
+									  GROUPING_STRATEGY_HASH, nullcheck);
 				setoff++;
 			}
 		}
 
+		if (groupStrategy & GROUPING_STRATEGY_INDEX)
+		{
+			ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
+								  pertrans, transno, 0, 0,
+								  GROUPING_STRATEGY_INDEX, nullcheck);
+		}
+
 		/* adjust early bail out jump target(s) */
 		foreach(bail, adjust_bailout)
 		{
@@ -4011,16 +4018,18 @@ static void
 ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 					  ExprEvalStep *scratch,
 					  FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
-					  int transno, int setno, int setoff, bool ishash,
+					  int transno, int setno, int setoff, int strategy,
 					  bool nullcheck)
 {
 	ExprContext *aggcontext;
 	int			adjust_jumpnull = -1;
 
-	if (ishash)
+	if (strategy & GROUPING_STRATEGY_HASH)
 		aggcontext = aggstate->hashcontext;
-	else
+	else if (strategy & GROUPING_STRATEGY_SORT)
 		aggcontext = aggstate->aggcontexts[setno];
+	else
+		aggcontext = aggstate->indexcontext;
 
 	/* add check for NULL pointer? */
 	if (nullcheck)
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index a18556f62ec..c5c6b7bfce9 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -364,7 +364,7 @@ typedef struct FindColsContext
 	Bitmapset  *unaggregated;	/* other column references */
 } FindColsContext;
 
-static void select_current_set(AggState *aggstate, int setno, bool is_hash);
+static void select_current_set(AggState *aggstate, int setno, int strategy);
 static void initialize_phase(AggState *aggstate, int newphase);
 static TupleTableSlot *fetch_input_tuple(AggState *aggstate);
 static void initialize_aggregates(AggState *aggstate,
@@ -403,8 +403,8 @@ static void find_cols(AggState *aggstate, Bitmapset **aggregated,
 static bool find_cols_walker(Node *node, FindColsContext *context);
 static void build_hash_tables(AggState *aggstate);
 static void build_hash_table(AggState *aggstate, int setno, double nbuckets);
-static void hashagg_recompile_expressions(AggState *aggstate, bool minslot,
-										  bool nullcheck);
+static void agg_recompile_expressions(AggState *aggstate, bool minslot,
+									  bool nullcheck);
 static void hash_create_memory(AggState *aggstate);
 static double hash_choose_num_buckets(double hashentrysize,
 									  double ngroups, Size memory);
@@ -431,13 +431,13 @@ static HashAggBatch *hashagg_batch_new(LogicalTape *input_tape, int setno,
 									   int64 input_tuples, double input_card,
 									   int used_bits);
 static MinimalTuple hashagg_batch_read(HashAggBatch *batch, uint32 *hashp);
-static void hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset,
-							   int used_bits, double input_groups,
-							   double hashentrysize);
-static Size hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
-								TupleTableSlot *inputslot, uint32 hash);
-static void hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill,
-								 int setno);
+static void agg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset,
+						   int used_bits, double input_groups,
+						   double hashentrysize);
+static Size agg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
+							TupleTableSlot *inputslot, uint32 hash);
+static void agg_spill_finish(AggState *aggstate, HashAggSpill *spill,
+							 int setno);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
 static void build_pertrans_for_aggref(AggStatePerTrans pertrans,
 									  AggState *aggstate, EState *estate,
@@ -446,21 +446,27 @@ static void build_pertrans_for_aggref(AggStatePerTrans pertrans,
 									  Oid aggdeserialfn, Datum initValue,
 									  bool initValueIsNull, Oid *inputTypes,
 									  int numArguments);
-
+static void agg_fill_index(AggState *state);
+static TupleTableSlot *agg_retrieve_index(AggState *state);
+static void lookup_index_entries(AggState *state);
+static void indexagg_finish_initial_spills(AggState *aggstate);
+static void index_agg_enter_spill_mode(AggState *aggstate);
 
 /*
  * Select the current grouping set; affects current_set and
  * curaggcontext.
  */
 static void
-select_current_set(AggState *aggstate, int setno, bool is_hash)
+select_current_set(AggState *aggstate, int setno, int strategy)
 {
 	/*
 	 * When changing this, also adapt ExecAggPlainTransByVal() and
 	 * ExecAggPlainTransByRef().
 	 */
-	if (is_hash)
+	if (strategy == GROUPING_STRATEGY_HASH)
 		aggstate->curaggcontext = aggstate->hashcontext;
+	else if (strategy == GROUPING_STRATEGY_INDEX)
+		aggstate->curaggcontext = aggstate->indexcontext;
 	else
 		aggstate->curaggcontext = aggstate->aggcontexts[setno];
 
@@ -680,7 +686,7 @@ initialize_aggregates(AggState *aggstate,
 	{
 		AggStatePerGroup pergroup = pergroups[setno];
 
-		select_current_set(aggstate, setno, false);
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_SORT);
 
 		for (transno = 0; transno < numTrans; transno++)
 		{
@@ -1478,7 +1484,7 @@ build_hash_tables(AggState *aggstate)
 			continue;
 		}
 
-		memory = aggstate->hash_mem_limit / aggstate->num_hashes;
+		memory = aggstate->spill_mem_limit / aggstate->num_hashes;
 
 		/* choose reasonable number of buckets per hashtable */
 		nbuckets = hash_choose_num_buckets(aggstate->hashentrysize,
@@ -1496,7 +1502,7 @@ build_hash_tables(AggState *aggstate)
 		build_hash_table(aggstate, setno, nbuckets);
 	}
 
-	aggstate->hash_ngroups_current = 0;
+	aggstate->spill_ngroups_current = 0;
 }
 
 /*
@@ -1728,7 +1734,7 @@ hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
 }
 
 /*
- * hashagg_recompile_expressions()
+ * agg_recompile_expressions()
  *
  * Identifies the right phase, compiles the right expression given the
  * arguments, and then sets phase->evalfunc to that expression.
@@ -1746,34 +1752,47 @@ hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
  * expressions in the AggStatePerPhase, and reuse when appropriate.
  */
 static void
-hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
+agg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
 {
 	AggStatePerPhase phase;
 	int			i = minslot ? 1 : 0;
 	int			j = nullcheck ? 1 : 0;
 
 	Assert(aggstate->aggstrategy == AGG_HASHED ||
-		   aggstate->aggstrategy == AGG_MIXED);
+		   aggstate->aggstrategy == AGG_MIXED ||
+		   aggstate->aggstrategy == AGG_INDEX);
 
-	if (aggstate->aggstrategy == AGG_HASHED)
-		phase = &aggstate->phases[0];
-	else						/* AGG_MIXED */
+	if (aggstate->aggstrategy == AGG_MIXED)
 		phase = &aggstate->phases[1];
+	else						/* AGG_HASHED or AGG_INDEX */
+		phase = &aggstate->phases[0];
 
 	if (phase->evaltrans_cache[i][j] == NULL)
 	{
 		const TupleTableSlotOps *outerops = aggstate->ss.ps.outerops;
 		bool		outerfixed = aggstate->ss.ps.outeropsfixed;
-		bool		dohash = true;
-		bool		dosort = false;
+		int			strategy = 0;
 
-		/*
-		 * If minslot is true, that means we are processing a spilled batch
-		 * (inside agg_refill_hash_table()), and we must not advance the
-		 * sorted grouping sets.
-		 */
-		if (aggstate->aggstrategy == AGG_MIXED && !minslot)
-			dosort = true;
+		switch (aggstate->aggstrategy)
+		{
+			case AGG_MIXED:
+				/*
+				 * If minslot is true, that means we are processing a spilled batch
+				 * (inside agg_refill_hash_table()), and we must not advance the
+				 * sorted grouping sets.
+				 */
+				if (!minslot)
+					strategy |= GROUPING_STRATEGY_SORT;
+				/* FALLTHROUGH */
+			case AGG_HASHED:
+				strategy |= GROUPING_STRATEGY_HASH;
+				break;
+			case AGG_INDEX:
+				strategy |= GROUPING_STRATEGY_INDEX;
+				break;	
+			default:
+				Assert(false);
+		}
 
 		/* temporarily change the outerops while compiling the expression */
 		if (minslot)
@@ -1783,8 +1802,7 @@ hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
 		}
 
 		phase->evaltrans_cache[i][j] = ExecBuildAggTrans(aggstate, phase,
-														 dosort, dohash,
-														 nullcheck);
+														 strategy, nullcheck);
 
 		/* change back */
 		aggstate->ss.ps.outerops = outerops;
@@ -1803,9 +1821,9 @@ hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
  * substantially larger than the initial value.
  */
 void
-hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
-					Size *mem_limit, uint64 *ngroups_limit,
-					int *num_partitions)
+agg_set_limits(double hashentrysize, double input_groups, int used_bits,
+			   Size *mem_limit, uint64 *ngroups_limit,
+			   int *num_partitions)
 {
 	int			npartitions;
 	Size		partition_mem;
@@ -1853,6 +1871,18 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
 		*ngroups_limit = 1;
 }
 
+static inline bool
+agg_spill_required(AggState *aggstate, Size total_mem)
+{
+	/*
+	 * Don't spill unless there's at least one group in the hash table so we
+	 * can be sure to make progress even in edge cases.
+	 */
+	return aggstate->spill_ngroups_current > 0 &&
+			(total_mem > aggstate->spill_mem_limit ||
+			 aggstate->spill_ngroups_current > aggstate->spill_ngroups_limit);
+}
+
 /*
  * hash_agg_check_limits
  *
@@ -1863,7 +1893,6 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
 static void
 hash_agg_check_limits(AggState *aggstate)
 {
-	uint64		ngroups = aggstate->hash_ngroups_current;
 	Size		meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt,
 													 true);
 	Size		entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt,
@@ -1874,7 +1903,7 @@ hash_agg_check_limits(AggState *aggstate)
 	bool		do_spill = false;
 
 #ifdef USE_INJECTION_POINTS
-	if (ngroups >= 1000)
+	if (aggstate->spill_ngroups_current >= 1000)
 	{
 		if (IS_INJECTION_POINT_ATTACHED("hash-aggregate-spill-1000"))
 		{
@@ -1888,9 +1917,7 @@ hash_agg_check_limits(AggState *aggstate)
 	 * Don't spill unless there's at least one group in the hash table so we
 	 * can be sure to make progress even in edge cases.
 	 */
-	if (aggstate->hash_ngroups_current > 0 &&
-		(total_mem > aggstate->hash_mem_limit ||
-		 ngroups > aggstate->hash_ngroups_limit))
+	if (agg_spill_required(aggstate, total_mem))
 	{
 		do_spill = true;
 	}
@@ -1899,97 +1926,199 @@ hash_agg_check_limits(AggState *aggstate)
 		hash_agg_enter_spill_mode(aggstate);
 }
 
+static void
+index_agg_check_limits(AggState *aggstate)
+{
+	Size		meta_mem = MemoryContextMemAllocated(aggstate->index_metacxt,
+													 true);
+	Size		node_mem = MemoryContextMemAllocated(aggstate->index_nodecxt,
+													 true);
+	Size		entry_mem = MemoryContextMemAllocated(aggstate->index_entrycxt,
+													  true);
+	Size		tval_mem = MemoryContextMemAllocated(aggstate->indexcontext->ecxt_per_tuple_memory,
+													 true);
+	Size		total_mem = meta_mem + node_mem + entry_mem + tval_mem;
+	bool		do_spill = false;
+
+#ifdef USE_INJECTION_POINTS
+	if (aggstate->spill_ngroups_current >= 1000)
+	{
+		if (IS_INJECTION_POINT_ATTACHED("index-aggregate-spill-1000"))
+		{
+			do_spill = true;
+			INJECTION_POINT_CACHED("index-aggregate-spill-1000", NULL);
+		}
+	}
+#endif
+
+	if (agg_spill_required(aggstate, total_mem))
+	{
+		do_spill = true;
+	}
+
+	if (do_spill)
+		index_agg_enter_spill_mode(aggstate);
+}
+
 /*
  * Enter "spill mode", meaning that no new groups are added to any of the hash
  * tables. Tuples that would create a new group are instead spilled, and
  * processed later.
  */
-static void
-hash_agg_enter_spill_mode(AggState *aggstate)
+static inline void
+agg_enter_spill_mode(AggState *aggstate, bool ishash)
 {
-	INJECTION_POINT("hash-aggregate-enter-spill-mode", NULL);
-	aggstate->hash_spill_mode = true;
-	hashagg_recompile_expressions(aggstate, aggstate->table_filled, true);
-
-	if (!aggstate->hash_ever_spilled)
+	if (ishash)
 	{
-		Assert(aggstate->hash_tapeset == NULL);
-		Assert(aggstate->hash_spills == NULL);
-
-		aggstate->hash_ever_spilled = true;
-
-		aggstate->hash_tapeset = LogicalTapeSetCreate(true, NULL, -1);
+		INJECTION_POINT("hash-aggregate-enter-spill-mode", NULL);
+		aggstate->spill_mode = true;
+		agg_recompile_expressions(aggstate, aggstate->table_filled, true);	
+	}
+	else
+	{
+		INJECTION_POINT("index-aggregate-enter-spill-mode", NULL);
+		aggstate->spill_mode = true;
+		agg_recompile_expressions(aggstate, aggstate->index_filled, true);
+	}
+	
+	if (!aggstate->spill_ever_happened)
+	{
+		Assert(aggstate->spill_tapeset == NULL);
+		Assert(aggstate->spills == NULL);
 
-		aggstate->hash_spills = palloc_array(HashAggSpill, aggstate->num_hashes);
+		aggstate->spill_ever_happened = true;
+		aggstate->spill_tapeset = LogicalTapeSetCreate(true, NULL, -1);
 
-		for (int setno = 0; setno < aggstate->num_hashes; setno++)
+		if (ishash)
 		{
-			AggStatePerHash perhash = &aggstate->perhash[setno];
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
-
-			hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
+			aggstate->spills = palloc_array(HashAggSpill, aggstate->num_hashes);
+	
+			for (int setno = 0; setno < aggstate->num_hashes; setno++)
+			{
+				AggStatePerHash perhash = &aggstate->perhash[setno];
+				HashAggSpill *spill = &aggstate->spills[setno];
+	
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
 							   perhash->aggnode->numGroups,
 							   aggstate->hashentrysize);
+			}
+		}
+		else
+		{
+			aggstate->spills = palloc(sizeof(HashAggSpill));
+			agg_spill_init(aggstate->spills, aggstate->spill_tapeset, 0,
+						   aggstate->perindex->aggnode->numGroups,
+						   aggstate->hashentrysize);
 		}
 	}
 }
 
+static void
+hash_agg_enter_spill_mode(AggState *aggstate)
+{
+	agg_enter_spill_mode(aggstate, true);
+}
+
+static void
+index_agg_enter_spill_mode(AggState *aggstate)
+{
+	agg_enter_spill_mode(aggstate, false);
+}
+
 /*
  * Update metrics after filling the hash table.
  *
  * If reading from the outer plan, from_tape should be false; if reading from
  * another tape, from_tape should be true.
  */
-static void
-hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+static inline void
+agg_update_spill_metrics(AggState *aggstate, bool from_tape, int npartitions, bool ishash)
 {
 	Size		meta_mem;
 	Size		entry_mem;
-	Size		hashkey_mem;
+	Size		key_mem;
 	Size		buffer_mem;
 	Size		total_mem;
 
 	if (aggstate->aggstrategy != AGG_MIXED &&
-		aggstate->aggstrategy != AGG_HASHED)
+		aggstate->aggstrategy != AGG_HASHED &&
+		aggstate->aggstrategy != AGG_INDEX)
 		return;
 
-	/* memory for the hash table itself */
-	meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt, true);
-
-	/* memory for hash entries */
-	entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt, true);
-
-	/* memory for byref transition states */
-	hashkey_mem = MemoryContextMemAllocated(aggstate->hashcontext->ecxt_per_tuple_memory, true);
-
+		if (ishash)
+		{
+			/* memory for the hash table itself */
+			meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt, true);
+			
+			/* memory for hash entries */
+			entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt, true);
+			
+			/* memory for byref transition states */
+			key_mem = MemoryContextMemAllocated(aggstate->hashcontext->ecxt_per_tuple_memory, true);
+		}
+		else
+		{
+			/* memory for the index itself */
+			meta_mem = MemoryContextMemAllocated(aggstate->index_metacxt, true);
+			
+			/* memory for the index nodes */
+			meta_mem += MemoryContextMemAllocated(aggstate->index_nodecxt, true);
+			
+			/* memory for index entries */
+			entry_mem = MemoryContextMemAllocated(aggstate->index_entrycxt, true);
+
+			/* memory for byref transition states */
+			key_mem = MemoryContextMemAllocated(aggstate->indexcontext->ecxt_per_tuple_memory, true);
+		}
 	/* memory for read/write tape buffers, if spilled */
 	buffer_mem = npartitions * HASHAGG_WRITE_BUFFER_SIZE;
 	if (from_tape)
 		buffer_mem += HASHAGG_READ_BUFFER_SIZE;
 
 	/* update peak mem */
-	total_mem = meta_mem + entry_mem + hashkey_mem + buffer_mem;
-	if (total_mem > aggstate->hash_mem_peak)
-		aggstate->hash_mem_peak = total_mem;
+	total_mem = meta_mem + entry_mem + key_mem + buffer_mem;
+	if (total_mem > aggstate->spill_mem_peak)
+		aggstate->spill_mem_peak = total_mem;
 
 	/* update disk usage */
-	if (aggstate->hash_tapeset != NULL)
+	if (aggstate->spill_tapeset != NULL)
 	{
-		uint64		disk_used = LogicalTapeSetBlocks(aggstate->hash_tapeset) * (BLCKSZ / 1024);
+		uint64		disk_used = LogicalTapeSetBlocks(aggstate->spill_tapeset) * (BLCKSZ / 1024);
 
-		if (aggstate->hash_disk_used < disk_used)
-			aggstate->hash_disk_used = disk_used;
+		if (aggstate->spill_disk_used < disk_used)
+			aggstate->spill_disk_used = disk_used;
 	}
 
 	/* update hashentrysize estimate based on contents */
-	if (aggstate->hash_ngroups_current > 0)
+	if (aggstate->spill_ngroups_current > 0)
 	{
-		aggstate->hashentrysize =
-			TupleHashEntrySize() +
-			(hashkey_mem / (double) aggstate->hash_ngroups_current);
+		if (ishash)
+		{
+			aggstate->hashentrysize =
+				TupleHashEntrySize() +
+				(key_mem / (double) aggstate->spill_ngroups_current);
+		}
+		else
+		{
+			/* index stores MinimalTuples directly without any wrapper */
+			aggstate->hashentrysize = 
+				(key_mem / (double) aggstate->spill_ngroups_current);
+		}
 	}
 }
 
+static void
+hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+{
+	agg_update_spill_metrics(aggstate, from_tape, npartitions, true);
+}
+
+static void
+index_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+{
+	agg_update_spill_metrics(aggstate, from_tape, npartitions, false);
+}
+
 /*
  * Create memory contexts used for hash aggregation.
  */
@@ -2048,6 +2177,33 @@ hash_create_memory(AggState *aggstate)
 
 }
 
+/*
+ * Create memory contexts used for index aggregation.
+ */
+static void
+index_create_memory(AggState *aggstate)
+{
+	Size maxBlockSize = ALLOCSET_DEFAULT_MAXSIZE;
+	
+	aggstate->indexcontext = CreateWorkExprContext(aggstate->ss.ps.state);
+	
+	aggstate->index_metacxt = AllocSetContextCreate(aggstate->ss.ps.state->es_query_cxt,
+													"IndexAgg meta context",
+													ALLOCSET_DEFAULT_SIZES);
+	aggstate->index_nodecxt = BumpContextCreate(aggstate->ss.ps.state->es_query_cxt,
+												"IndexAgg node context",
+												ALLOCSET_SMALL_SIZES);
+
+	maxBlockSize = pg_prevpower2_size_t(work_mem * (Size) 1024 / 16);
+	maxBlockSize = Min(maxBlockSize, ALLOCSET_DEFAULT_MAXSIZE);
+	maxBlockSize = Max(maxBlockSize, ALLOCSET_DEFAULT_INITSIZE);
+	aggstate->index_entrycxt = AllocSetContextCreate(aggstate->ss.ps.state->es_query_cxt,
+												"IndexAgg table context",
+												ALLOCSET_DEFAULT_MINSIZE,
+												ALLOCSET_DEFAULT_INITSIZE,
+												maxBlockSize);
+}
+
 /*
  * Choose a reasonable number of buckets for the initial hash table size.
  */
@@ -2141,7 +2297,7 @@ initialize_hash_entry(AggState *aggstate, TupleHashTable hashtable,
 	AggStatePerGroup pergroup;
 	int			transno;
 
-	aggstate->hash_ngroups_current++;
+	aggstate->spill_ngroups_current++;
 	hash_agg_check_limits(aggstate);
 
 	/* no need to allocate or initialize per-group state */
@@ -2196,9 +2352,9 @@ lookup_hash_entries(AggState *aggstate)
 		bool	   *p_isnew;
 
 		/* if hash table already spilled, don't create new entries */
-		p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
 
-		select_current_set(aggstate, setno, true);
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_HASH);
 		prepare_hash_slot(perhash,
 						  outerslot,
 						  hashslot);
@@ -2214,15 +2370,15 @@ lookup_hash_entries(AggState *aggstate)
 		}
 		else
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 			TupleTableSlot *slot = aggstate->tmpcontext->ecxt_outertuple;
 
 			if (spill->partitions == NULL)
-				hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
-								   perhash->aggnode->numGroups,
-								   aggstate->hashentrysize);
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
+							   perhash->aggnode->numGroups,
+							   aggstate->hashentrysize);
 
-			hashagg_spill_tuple(aggstate, spill, slot, hash);
+			agg_spill_tuple(aggstate, spill, slot, hash);
 			pergroup[setno] = NULL;
 		}
 	}
@@ -2265,6 +2421,12 @@ ExecAgg(PlanState *pstate)
 			case AGG_SORTED:
 				result = agg_retrieve_direct(node);
 				break;
+			case AGG_INDEX:
+				if (!node->index_filled)
+					agg_fill_index(node);
+
+				result = agg_retrieve_index(node);
+				break;
 		}
 
 		if (!TupIsNull(result))
@@ -2381,7 +2543,7 @@ agg_retrieve_direct(AggState *aggstate)
 				aggstate->table_filled = true;
 				ResetTupleHashIterator(aggstate->perhash[0].hashtable,
 									   &aggstate->perhash[0].hashiter);
-				select_current_set(aggstate, 0, true);
+				select_current_set(aggstate, 0, GROUPING_STRATEGY_HASH);
 				return agg_retrieve_hash_table(aggstate);
 			}
 			else
@@ -2601,7 +2763,7 @@ agg_retrieve_direct(AggState *aggstate)
 
 		prepare_projection_slot(aggstate, econtext->ecxt_outertuple, currentSet);
 
-		select_current_set(aggstate, currentSet, false);
+		select_current_set(aggstate, currentSet, GROUPING_STRATEGY_SORT);
 
 		finalize_aggregates(aggstate,
 							peragg,
@@ -2683,19 +2845,19 @@ agg_refill_hash_table(AggState *aggstate)
 	HashAggBatch *batch;
 	AggStatePerHash perhash;
 	HashAggSpill spill;
-	LogicalTapeSet *tapeset = aggstate->hash_tapeset;
+	LogicalTapeSet *tapeset = aggstate->spill_tapeset;
 	bool		spill_initialized = false;
 
-	if (aggstate->hash_batches == NIL)
+	if (aggstate->spill_batches == NIL)
 		return false;
 
 	/* hash_batches is a stack, with the top item at the end of the list */
-	batch = llast(aggstate->hash_batches);
-	aggstate->hash_batches = list_delete_last(aggstate->hash_batches);
+	batch = llast(aggstate->spill_batches);
+	aggstate->spill_batches = list_delete_last(aggstate->spill_batches);
 
-	hash_agg_set_limits(aggstate->hashentrysize, batch->input_card,
-						batch->used_bits, &aggstate->hash_mem_limit,
-						&aggstate->hash_ngroups_limit, NULL);
+	agg_set_limits(aggstate->hashentrysize, batch->input_card,
+				   batch->used_bits, &aggstate->spill_mem_limit,
+				   &aggstate->spill_ngroups_limit, NULL);
 
 	/*
 	 * Each batch only processes one grouping set; set the rest to NULL so
@@ -2712,7 +2874,7 @@ agg_refill_hash_table(AggState *aggstate)
 	for (int setno = 0; setno < aggstate->num_hashes; setno++)
 		ResetTupleHashTable(aggstate->perhash[setno].hashtable);
 
-	aggstate->hash_ngroups_current = 0;
+	aggstate->spill_ngroups_current = 0;
 
 	/*
 	 * In AGG_MIXED mode, hash aggregation happens in phase 1 and the output
@@ -2726,7 +2888,7 @@ agg_refill_hash_table(AggState *aggstate)
 		aggstate->phase = &aggstate->phases[aggstate->current_phase];
 	}
 
-	select_current_set(aggstate, batch->setno, true);
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_HASH);
 
 	perhash = &aggstate->perhash[aggstate->current_set];
 
@@ -2737,19 +2899,19 @@ agg_refill_hash_table(AggState *aggstate)
 	 * We still need the NULL check, because we are only processing one
 	 * grouping set at a time and the rest will be NULL.
 	 */
-	hashagg_recompile_expressions(aggstate, true, true);
+	agg_recompile_expressions(aggstate, true, true);
 
 	INJECTION_POINT("hash-aggregate-process-batch", NULL);
 	for (;;)
 	{
-		TupleTableSlot *spillslot = aggstate->hash_spill_rslot;
+		TupleTableSlot *spillslot = aggstate->spill_rslot;
 		TupleTableSlot *hashslot = perhash->hashslot;
 		TupleHashTable hashtable = perhash->hashtable;
 		TupleHashEntry entry;
 		MinimalTuple tuple;
 		uint32		hash;
 		bool		isnew = false;
-		bool	   *p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
+		bool	   *p_isnew = aggstate->spill_mode ? NULL : &isnew;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2782,11 +2944,11 @@ agg_refill_hash_table(AggState *aggstate)
 				 * that we don't assign tapes that will never be used.
 				 */
 				spill_initialized = true;
-				hashagg_spill_init(&spill, tapeset, batch->used_bits,
-								   batch->input_card, aggstate->hashentrysize);
+				agg_spill_init(&spill, tapeset, batch->used_bits,
+							   batch->input_card, aggstate->hashentrysize);
 			}
 			/* no memory for a new group, spill */
-			hashagg_spill_tuple(aggstate, &spill, spillslot, hash);
+			agg_spill_tuple(aggstate, &spill, spillslot, hash);
 
 			aggstate->hash_pergroup[batch->setno] = NULL;
 		}
@@ -2806,16 +2968,16 @@ agg_refill_hash_table(AggState *aggstate)
 
 	if (spill_initialized)
 	{
-		hashagg_spill_finish(aggstate, &spill, batch->setno);
+		agg_spill_finish(aggstate, &spill, batch->setno);
 		hash_agg_update_metrics(aggstate, true, spill.npartitions);
 	}
 	else
 		hash_agg_update_metrics(aggstate, true, 0);
 
-	aggstate->hash_spill_mode = false;
+	aggstate->spill_mode = false;
 
 	/* prepare to walk the first hash table */
-	select_current_set(aggstate, batch->setno, true);
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_HASH);
 	ResetTupleHashIterator(aggstate->perhash[batch->setno].hashtable,
 						   &aggstate->perhash[batch->setno].hashiter);
 
@@ -2975,14 +3137,14 @@ agg_retrieve_hash_table_in_memory(AggState *aggstate)
 }
 
 /*
- * hashagg_spill_init
+ * agg_spill_init
  *
  * Called after we determined that spilling is necessary. Chooses the number
  * of partitions to create, and initializes them.
  */
 static void
-hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
-				   double input_groups, double hashentrysize)
+agg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
+			   double input_groups, double hashentrysize)
 {
 	int			npartitions;
 	int			partition_bits;
@@ -3018,14 +3180,13 @@ hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
 }
 
 /*
- * hashagg_spill_tuple
+ * agg_spill_tuple
  *
- * No room for new groups in the hash table. Save for later in the appropriate
- * partition.
+ * No room for new groups in memory. Save for later in the appropriate partition.
  */
 static Size
-hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
-					TupleTableSlot *inputslot, uint32 hash)
+agg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
+				TupleTableSlot *inputslot, uint32 hash)
 {
 	TupleTableSlot *spillslot;
 	int			partition;
@@ -3039,7 +3200,7 @@ hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
 	/* spill only attributes that we actually need */
 	if (!aggstate->all_cols_needed)
 	{
-		spillslot = aggstate->hash_spill_wslot;
+		spillslot = aggstate->spill_wslot;
 		slot_getsomeattrs(inputslot, aggstate->max_colno_needed);
 		ExecClearTuple(spillslot);
 		for (int i = 0; i < spillslot->tts_tupleDescriptor->natts; i++)
@@ -3167,14 +3328,14 @@ hashagg_finish_initial_spills(AggState *aggstate)
 	int			setno;
 	int			total_npartitions = 0;
 
-	if (aggstate->hash_spills != NULL)
+	if (aggstate->spills != NULL)
 	{
 		for (setno = 0; setno < aggstate->num_hashes; setno++)
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 
 			total_npartitions += spill->npartitions;
-			hashagg_spill_finish(aggstate, spill, setno);
+			agg_spill_finish(aggstate, spill, setno);
 		}
 
 		/*
@@ -3182,21 +3343,21 @@ hashagg_finish_initial_spills(AggState *aggstate)
 		 * processing batches of spilled tuples. The initial spill structures
 		 * are no longer needed.
 		 */
-		pfree(aggstate->hash_spills);
-		aggstate->hash_spills = NULL;
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
 	}
 
 	hash_agg_update_metrics(aggstate, false, total_npartitions);
-	aggstate->hash_spill_mode = false;
+	aggstate->spill_mode = false;
 }
 
 /*
- * hashagg_spill_finish
+ * agg_spill_finish
  *
  * Transform spill partitions into new batches.
  */
 static void
-hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
+agg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
 {
 	int			i;
 	int			used_bits = 32 - spill->shift;
@@ -3223,8 +3384,8 @@ hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
 		new_batch = hashagg_batch_new(tape, setno,
 									  spill->ntuples[i], cardinality,
 									  used_bits);
-		aggstate->hash_batches = lappend(aggstate->hash_batches, new_batch);
-		aggstate->hash_batches_used++;
+		aggstate->spill_batches = lappend(aggstate->spill_batches, new_batch);
+		aggstate->spill_batches_used++;
 	}
 
 	pfree(spill->ntuples);
@@ -3239,33 +3400,670 @@ static void
 hashagg_reset_spill_state(AggState *aggstate)
 {
 	/* free spills from initial pass */
-	if (aggstate->hash_spills != NULL)
+	if (aggstate->spills != NULL)
 	{
 		int			setno;
 
 		for (setno = 0; setno < aggstate->num_hashes; setno++)
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 
 			pfree(spill->ntuples);
 			pfree(spill->partitions);
 		}
-		pfree(aggstate->hash_spills);
-		aggstate->hash_spills = NULL;
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
 	}
 
 	/* free batches */
-	list_free_deep(aggstate->hash_batches);
-	aggstate->hash_batches = NIL;
+	list_free_deep(aggstate->spill_batches);
+	aggstate->spill_batches = NIL;
 
 	/* close tape set */
-	if (aggstate->hash_tapeset != NULL)
+	if (aggstate->spill_tapeset != NULL)
 	{
-		LogicalTapeSetClose(aggstate->hash_tapeset);
-		aggstate->hash_tapeset = NULL;
+		LogicalTapeSetClose(aggstate->spill_tapeset);
+		aggstate->spill_tapeset = NULL;
 	}
 }
+static void
+agg_fill_index(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	ExprContext *tmpcontext = aggstate->tmpcontext;
+	
+	/*
+	 * Process each outer-plan tuple, and then fetch the next one, until we
+	 * exhaust the outer plan.
+	 */
+	for (;;)
+	{
+		TupleTableSlot *outerslot;
+
+		outerslot = fetch_input_tuple(aggstate);
+		if (TupIsNull(outerslot))
+			break;
+
+		/* set up for lookup_index_entries and advance_aggregates */
+		tmpcontext->ecxt_outertuple = outerslot;
 
+		/* insert input tuple to index possibly spilling index to disk */
+		lookup_index_entries(aggstate);
+
+		/* Advance the aggregates (or combine functions) */
+		advance_aggregates(aggstate);
+
+		/*
+		 * Reset per-input-tuple context after each tuple, but note that the
+		 * hash lookups do this too
+		 */
+		ResetExprContext(aggstate->tmpcontext);
+	}
+
+	/* 
+	 * Mark that index filled here, so during after recompilation
+	 * expr will expect MinimalTuple instead of outer plan's one type.
+	 */
+	aggstate->index_filled = true;
+
+	indexagg_finish_initial_spills(aggstate);
+
+	/* 
+	 * This is useful only when there is no spill occurred and projecting
+	 * occurs in memory, but still initialize it.
+	 */
+	select_current_set(aggstate, 0, GROUPING_STRATEGY_INDEX);
+	InitTupleIndexIterator(perindex->index, &perindex->iter);
+}
+
+/* 
+ * Extract the attributes that make up the grouping key into the
+ * indexslot. This is necessary to perform comparison in index.
+ */
+static void
+prepare_index_slot(AggStatePerIndex perindex,
+				   TupleTableSlot *inputslot,
+				   TupleTableSlot *indexslot)
+{
+	slot_getsomeattrs(inputslot, perindex->largestGrpColIdx);
+	ExecClearTuple(indexslot);
+	
+	for (int i = 0; i < perindex->numCols; ++i)
+	{
+		int		varNumber = perindex->idxKeyColIdxInput[i] - 1;
+		indexslot->tts_values[i] = inputslot->tts_values[varNumber];
+		indexslot->tts_isnull[i] = inputslot->tts_isnull[varNumber];
+	}
+	ExecStoreVirtualTuple(indexslot);
+}
+
+static void
+indexagg_reset_spill_state(AggState *aggstate)
+{
+	/* free spills from initial pass */
+	if (aggstate->spills != NULL)
+	{
+		HashAggSpill *spill = &aggstate->spills[0];
+		pfree(spill->ntuples);
+		pfree(spill->partitions);
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
+	}
+
+	/* free batches */
+	list_free_deep(aggstate->spill_batches);
+	aggstate->spill_batches = NIL;
+
+	/* close tape set */
+	if (aggstate->spill_tapeset != NULL)
+	{
+		LogicalTapeSetClose(aggstate->spill_tapeset);
+		aggstate->spill_tapeset = NULL;
+	}
+}
+
+/* 
+ * Initialize a freshly-created MinimalTuple in index
+ */
+static void
+initialize_index_entry(AggState *aggstate, TupleIndex index, TupleIndexEntry entry)
+{
+	AggStatePerGroup pergroup;
+
+	aggstate->spill_ngroups_current++;
+	index_agg_check_limits(aggstate);
+
+	/* no need to allocate or initialize per-group state */
+	if (aggstate->numtrans == 0)
+		return;		
+
+	pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(index, entry);
+	
+	/* 
+	 * Initialize aggregates for new tuple group, indexagg_lookup_entries()
+	 * already has selected the relevant grouping set.
+	 */
+	for (int transno = 0; transno < aggstate->numtrans; ++transno)
+	{
+		AggStatePerTrans pertrans = &aggstate->pertrans[transno];
+		AggStatePerGroup pergroupstate = &pergroup[transno];
+		
+		initialize_aggregate(aggstate, pertrans, pergroupstate);
+	}
+}
+
+/* 
+ * Create new sorted run from current in-memory stored index.
+ */
+static void
+indexagg_save_index_run(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	ExprContext *econtext;
+	TupleIndexIteratorData iter;
+	AggStatePerAgg peragg;
+	TupleTableSlot *firstSlot;
+	TupleIndexEntry entry;
+	TupleTableSlot *indexslot;
+	AggStatePerGroup pergroup;
+	
+	econtext = aggstate->ss.ps.ps_ExprContext;
+	firstSlot = aggstate->ss.ss_ScanTupleSlot;
+	peragg = aggstate->peragg;
+	indexslot = perindex->indexslot;
+
+	InitTupleIndexIterator(perindex->index, &iter);
+	
+	tuplemerge_start_run(aggstate->mergestate);
+
+	while ((entry = TupleIndexIteratorNext(&iter)) != NULL)
+	{
+		MinimalTuple tuple = TupleIndexEntryGetMinimalTuple(entry);
+		TupleTableSlot *output;
+
+		ResetExprContext(econtext);
+		ExecStoreMinimalTuple(tuple, indexslot, false);
+		slot_getallattrs(indexslot);
+		
+		ExecClearTuple(firstSlot);
+		memset(firstSlot->tts_isnull, true,
+			   firstSlot->tts_tupleDescriptor->natts * sizeof(bool));
+
+		for (int i = 0; i < perindex->numCols; i++)
+		{
+			int varNumber = perindex->idxKeyColIdxInput[i] - 1;
+
+			firstSlot->tts_values[varNumber] = indexslot->tts_values[i];
+			firstSlot->tts_isnull[varNumber] = indexslot->tts_isnull[i];
+		}
+		ExecStoreVirtualTuple(firstSlot);
+
+		pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(perindex->index, entry);
+
+		econtext->ecxt_outertuple = firstSlot;
+		prepare_projection_slot(aggstate,
+								econtext->ecxt_outertuple,
+								aggstate->current_set);
+		finalize_aggregates(aggstate, peragg, pergroup);
+		output = project_aggregates(aggstate);
+		if (output)
+			tuplemerge_puttupleslot(aggstate->mergestate, output);
+	}
+
+	tuplemerge_end_run(aggstate->mergestate);
+}
+
+/* 
+ * Fill in index with tuples in given batch.
+ */
+static void
+indexagg_refill_batch(AggState *aggstate, HashAggBatch *batch)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	TupleTableSlot *spillslot = aggstate->spill_rslot;
+	TupleTableSlot *indexslot = perindex->indexslot;
+	TupleIndex index = perindex->index;
+	LogicalTapeSet *tapeset = aggstate->spill_tapeset;
+	HashAggSpill spill;
+	bool	spill_initialized = false;
+	int nspill = 0;
+	
+	agg_set_limits(aggstate->hashentrysize, batch->input_card, batch->used_bits,
+				   &aggstate->spill_mem_limit, &aggstate->spill_ngroups_limit, NULL);
+
+	ReScanExprContext(aggstate->indexcontext);
+
+	MemoryContextReset(aggstate->index_entrycxt);
+	MemoryContextReset(aggstate->index_nodecxt);
+	ResetTupleIndex(perindex->index);
+
+	aggstate->spill_ngroups_current = 0;
+
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_INDEX);
+
+	agg_recompile_expressions(aggstate, true, true);
+
+	for (;;)
+	{
+		MinimalTuple tuple;
+		TupleIndexEntry entry;
+		bool		isnew = false;
+		bool	   *p_isnew;
+		uint32		hash;
+
+		CHECK_FOR_INTERRUPTS();
+		
+		tuple = hashagg_batch_read(batch, &hash);
+		if (tuple == NULL)
+			break;
+
+		ExecStoreMinimalTuple(tuple, spillslot, true);
+		aggstate->tmpcontext->ecxt_outertuple = spillslot;
+
+		prepare_index_slot(perindex, spillslot, indexslot);
+
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
+		entry = TupleIndexLookup(index, indexslot, p_isnew);
+
+		if (entry != NULL)
+		{
+			if (isnew)
+				initialize_index_entry(aggstate, index, entry);
+
+			aggstate->all_pergroups[batch->setno] = TupleIndexEntryGetAdditional(index, entry);
+			advance_aggregates(aggstate);
+		}
+		else
+		{
+			if (!spill_initialized)
+			{
+				spill_initialized = true;
+				agg_spill_init(&spill, tapeset, batch->used_bits,
+							   batch->input_card, aggstate->hashentrysize);
+			}
+			nspill++;
+
+			agg_spill_tuple(aggstate, &spill, spillslot, hash);
+			aggstate->all_pergroups[batch->setno] = NULL;
+		}
+		
+		ResetExprContext(aggstate->tmpcontext);
+	}
+
+	LogicalTapeClose(batch->input_tape);
+
+	if (spill_initialized)
+	{
+		agg_spill_finish(aggstate, &spill, 0);
+		index_agg_update_metrics(aggstate, true, spill.npartitions);
+	}
+	else
+		index_agg_update_metrics(aggstate, true, 0);
+
+	aggstate->spill_mode = false;
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_INDEX);
+
+	pfree(batch);
+}
+
+static void
+indexagg_finish_initial_spills(AggState *aggstate)
+{
+	HashAggSpill *spill;
+	AggStatePerIndex perindex;
+	Sort		 *sort;
+
+	if (!aggstate->spill_ever_happened)
+		return;
+
+	Assert(aggstate->spills != NULL);
+
+	spill = aggstate->spills;
+	agg_spill_finish(aggstate, aggstate->spills, 0);
+
+	index_agg_update_metrics(aggstate, false, spill->npartitions);
+	aggstate->spill_mode = false;
+
+	pfree(aggstate->spills);
+	aggstate->spills = NULL;
+
+	perindex = aggstate->perindex;
+	sort = aggstate->index_sort;
+	aggstate->mergestate = tuplemerge_begin_heap(aggstate->ss.ps.ps_ResultTupleDesc,
+												 perindex->numKeyCols,
+												 perindex->idxKeyColIdxTL,
+												 sort->sortOperators,
+												 sort->collations,
+												 sort->nullsFirst,
+												 work_mem, NULL);
+	/* 
+	 * Some data was spilled.  Index aggregate requires output to be sorted,
+	 * so now we must process all remaining spilled data and produce sorted
+	 * runs for external merge.  The first saved run is current opened index.
+	 */
+	indexagg_save_index_run(aggstate);
+
+	while (aggstate->spill_batches != NIL)
+	{
+		HashAggBatch *batch = llast(aggstate->spill_batches);
+		aggstate->spill_batches = list_delete_last(aggstate->spill_batches);
+
+		indexagg_refill_batch(aggstate, batch);
+		indexagg_save_index_run(aggstate);
+	}
+
+	tuplemerge_performmerge(aggstate->mergestate);
+}
+
+static uint32
+index_calculate_input_slot_hash(AggState *aggstate,
+								TupleTableSlot *inputslot)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	MemoryContext oldcxt;
+	uint32 hash;
+	bool isnull;
+	
+	oldcxt = MemoryContextSwitchTo(aggstate->tmpcontext->ecxt_per_tuple_memory);
+	
+	perindex->exprcontext->ecxt_innertuple = inputslot;
+	hash = DatumGetUInt32(ExecEvalExpr(perindex->indexhashexpr,
+									   perindex->exprcontext,
+									   &isnull));
+
+	MemoryContextSwitchTo(oldcxt);
+
+	return hash;
+}
+
+/* 
+ * indexagg_lookup_entries
+ * 
+ * Insert input tuples to in-memory index.
+ */
+static void
+lookup_index_entries(AggState *aggstate)
+{
+	int numGroupingSets = Max(aggstate->maxsets, 1);
+	AggStatePerGroup *pergroup = aggstate->all_pergroups;
+	TupleTableSlot *outerslot = aggstate->tmpcontext->ecxt_outertuple;
+
+	for (int setno = 0; setno < numGroupingSets; ++setno)
+	{
+		AggStatePerIndex	perindex = &aggstate->perindex[setno];
+		TupleIndex		index = perindex->index;
+		TupleTableSlot *indexslot = perindex->indexslot;
+		TupleIndexEntry	entry;
+		bool			isnew = false;
+		bool		   *p_isnew;
+
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_INDEX);
+
+		prepare_index_slot(perindex, outerslot, indexslot);
+
+		/* Lookup entry in btree */
+		entry = TupleIndexLookup(perindex->index, indexslot, p_isnew);
+
+		/* For now everything is stored in memory - no disk spills */
+		if (entry != NULL)
+		{
+			/* Initialize it's trans state if just created */
+			if (isnew)
+				initialize_index_entry(aggstate, index, entry);
+
+			pergroup[setno] = TupleIndexEntryGetAdditional(index, entry);
+		}
+		else
+		{
+			HashAggSpill *spill = &aggstate->spills[setno];
+			uint32 hash;
+			
+			if (spill->partitions == NULL)
+			{
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
+							   perindex->aggnode->numGroups,
+							   aggstate->hashentrysize);
+			}
+
+			hash = index_calculate_input_slot_hash(aggstate, indexslot);
+			agg_spill_tuple(aggstate, spill, outerslot, hash);
+			pergroup[setno] = NULL;
+		}
+	}
+}
+
+static TupleTableSlot *
+agg_retrieve_index_in_memory(AggState *aggstate)
+{
+	ExprContext *econtext;
+	TupleTableSlot *firstSlot;
+	AggStatePerIndex perindex;
+	AggStatePerAgg peragg;
+	AggStatePerGroup pergroup;
+	TupleTableSlot *result;
+	
+	econtext = aggstate->ss.ps.ps_ExprContext;
+	firstSlot = aggstate->ss.ss_ScanTupleSlot;
+	peragg = aggstate->peragg;
+	perindex = &aggstate->perindex[aggstate->current_set];
+
+	for (;;)
+	{
+		TupleIndexEntry entry;
+		TupleTableSlot *indexslot = perindex->indexslot;
+
+		CHECK_FOR_INTERRUPTS();
+		
+		entry = TupleIndexIteratorNext(&perindex->iter);
+		if (entry == NULL)
+			return NULL;
+
+		ResetExprContext(econtext);
+		ExecStoreMinimalTuple(TupleIndexEntryGetMinimalTuple(entry), indexslot, false);
+		slot_getallattrs(indexslot);
+		
+		ExecClearTuple(firstSlot);
+		memset(firstSlot->tts_isnull, true,
+			   firstSlot->tts_tupleDescriptor->natts * sizeof(bool));
+
+		for (int i = 0; i < perindex->numCols; i++)
+		{
+			int varNumber = perindex->idxKeyColIdxInput[i] - 1;
+
+			firstSlot->tts_values[varNumber] = indexslot->tts_values[i];
+			firstSlot->tts_isnull[varNumber] = indexslot->tts_isnull[i];
+		}
+		ExecStoreVirtualTuple(firstSlot);
+		
+		pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(perindex->index, entry);
+		
+		econtext->ecxt_outertuple = firstSlot;
+		prepare_projection_slot(aggstate,
+								econtext->ecxt_outertuple,
+								aggstate->current_set);
+		finalize_aggregates(aggstate, peragg, pergroup);
+		result = project_aggregates(aggstate);
+		if (result)
+			return result;
+	}
+	
+	/* no more groups */
+	return NULL;
+}
+
+static TupleTableSlot *
+agg_retrieve_index_merge(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	TupleTableSlot *slot = perindex->mergeslot;
+	TupleTableSlot *resultslot = aggstate->ss.ps.ps_ResultTupleSlot;
+	
+	ExecClearTuple(slot);
+	
+	if (!tuplesort_gettupleslot(aggstate->mergestate, true, true, slot, NULL))
+		return NULL;
+
+	slot_getallattrs(slot);
+	ExecClearTuple(resultslot);
+	
+	for (int i = 0; i < resultslot->tts_tupleDescriptor->natts; ++i)
+	{
+		resultslot->tts_values[i] = slot->tts_values[i];
+		resultslot->tts_isnull[i] = slot->tts_isnull[i];
+	}
+	ExecStoreVirtualTuple(resultslot);
+
+	return resultslot;
+}
+
+static TupleTableSlot *
+agg_retrieve_index(AggState *aggstate)
+{
+	if (aggstate->spill_ever_happened)
+		return agg_retrieve_index_merge(aggstate);
+	else
+		return agg_retrieve_index_in_memory(aggstate);
+}
+
+static void
+build_index(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	MemoryContext metacxt = aggstate->index_metacxt;
+	MemoryContext entrycxt = aggstate->index_entrycxt;
+	MemoryContext nodecxt = aggstate->index_nodecxt;
+	MemoryContext oldcxt;
+	Size	additionalsize;
+	Oid	   *eqfuncoids;
+	Sort   *sort;
+
+	Assert(aggstate->aggstrategy == AGG_INDEX);
+
+	additionalsize = aggstate->numtrans * sizeof(AggStatePerGroupData);
+	sort = aggstate->index_sort;
+
+	/* inmem index */
+	perindex->index = BuildTupleIndex(perindex->indexslot->tts_tupleDescriptor,
+									  perindex->numKeyCols,
+									  perindex->idxKeyColIdxIndex,
+									  sort->sortOperators,
+									  sort->collations,
+									  sort->nullsFirst,
+									  additionalsize,
+									  metacxt,
+									  entrycxt,
+									  nodecxt);
+
+	/* disk spill logic */
+	oldcxt = MemoryContextSwitchTo(metacxt);
+	execTuplesHashPrepare(perindex->numKeyCols, perindex->aggnode->grpOperators,
+						  &eqfuncoids, &perindex->hashfunctions);
+	perindex->indexhashexpr =
+		ExecBuildHash32FromAttrs(perindex->indexslot->tts_tupleDescriptor,
+								 perindex->indexslot->tts_ops,
+								 perindex->hashfunctions,
+								 perindex->aggnode->grpCollations,
+								 perindex->numKeyCols,
+								 perindex->idxKeyColIdxIndex,
+								 &aggstate->ss.ps,
+								 0);
+	perindex->exprcontext = CreateStandaloneExprContext();
+	MemoryContextSwitchTo(oldcxt);
+}
+
+static void
+find_index_columns(AggState *aggstate)
+{
+	Bitmapset  *base_colnos;
+	Bitmapset  *aggregated_colnos;
+	TupleDesc	scanDesc = aggstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	List	   *outerTlist = outerPlanState(aggstate)->plan->targetlist;
+	EState	   *estate = aggstate->ss.ps.state;
+	AggStatePerIndex perindex;
+	Bitmapset  *colnos;
+	AttrNumber *sortColIdx;
+	List	   *indexTlist = NIL;
+	TupleDesc   indexDesc;
+	int			maxCols;
+	int			i;
+
+	find_cols(aggstate, &aggregated_colnos, &base_colnos);
+	aggstate->colnos_needed = bms_union(base_colnos, aggregated_colnos);
+	aggstate->max_colno_needed = 0;
+	aggstate->all_cols_needed = true;
+
+	for (i = 0; i < scanDesc->natts; i++)
+	{
+		int		colno = i + 1;
+
+		if (bms_is_member(colno, aggstate->colnos_needed))
+			aggstate->max_colno_needed = colno;
+		else
+			aggstate->all_cols_needed = false;
+	}
+
+	perindex = aggstate->perindex;
+	colnos = bms_copy(base_colnos);
+
+	if (aggstate->phases[0].grouped_cols)
+	{
+		Bitmapset *grouped_cols = aggstate->phases[0].grouped_cols[0];
+		ListCell  *lc;
+		foreach(lc, aggstate->all_grouped_cols)
+		{
+			int attnum = lfirst_int(lc);
+			if (!bms_is_member(attnum, grouped_cols))
+				colnos = bms_del_member(colnos, attnum);
+		}
+	}
+
+	maxCols = bms_num_members(colnos) + perindex->numKeyCols;
+
+	perindex->idxKeyColIdxInput = palloc(maxCols * sizeof(AttrNumber));
+	perindex->idxKeyColIdxIndex = palloc(perindex->numKeyCols * sizeof(AttrNumber));
+
+	/* Add all the sorting/grouping columns to colnos */
+	sortColIdx = aggstate->index_sort->sortColIdx;
+	for (i = 0; i < perindex->numKeyCols; i++)
+		colnos = bms_add_member(colnos, sortColIdx[i]);
+	
+	for (i = 0; i < perindex->numKeyCols; i++)
+	{
+		perindex->idxKeyColIdxInput[i] = sortColIdx[i];
+		perindex->idxKeyColIdxIndex[i] = i + 1;
+
+		perindex->numCols++;
+		/* delete already mapped columns */
+		colnos = bms_del_member(colnos, sortColIdx[i]);
+	}
+	
+	/* and the remainig columns */
+	i = -1;
+	while ((i = bms_next_member(colnos, i)) >= 0)
+	{
+		perindex->idxKeyColIdxInput[perindex->numCols] = i;
+		perindex->numCols++;
+	}
+
+	/* build tuple descriptor for the index */
+	perindex->largestGrpColIdx = 0;
+	for (i = 0; i < perindex->numCols; i++)
+	{
+		int		varNumber = perindex->idxKeyColIdxInput[i] - 1;
+		
+		indexTlist = lappend(indexTlist, list_nth(outerTlist, varNumber));
+		perindex->largestGrpColIdx = Max(varNumber + 1, perindex->largestGrpColIdx);
+	}
+
+	indexDesc = ExecTypeFromTL(indexTlist);
+	perindex->indexslot = ExecAllocTableSlot(&estate->es_tupleTable, indexDesc,
+										   &TTSOpsMinimalTuple);
+	list_free(indexTlist);
+	bms_free(colnos);
+
+	bms_free(base_colnos);
+}
 
 /* -----------------
  * ExecInitAgg
@@ -3297,10 +4095,12 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	int			numGroupingSets = 1;
 	int			numPhases;
 	int			numHashes;
+	int			numIndexes;
 	int			i = 0;
 	int			j = 0;
 	bool		use_hashing = (node->aggstrategy == AGG_HASHED ||
 							   node->aggstrategy == AGG_MIXED);
+	bool		use_index = (node->aggstrategy == AGG_INDEX);
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -3337,6 +4137,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	 */
 	numPhases = (use_hashing ? 1 : 2);
 	numHashes = (use_hashing ? 1 : 0);
+	numIndexes = (use_index ? 1 : 0);
 
 	/*
 	 * Calculate the maximum number of grouping sets in any phase; this
@@ -3356,7 +4157,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 
 			/*
 			 * additional AGG_HASHED aggs become part of phase 0, but all
-			 * others add an extra phase.
+			 * others add an extra phase.  AGG_INDEX does not support grouping
+			 * sets, so else branch must be AGG_SORTED or AGG_MIXED.
 			 */
 			if (agg->aggstrategy != AGG_HASHED)
 				++numPhases;
@@ -3395,6 +4197,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 
 	if (use_hashing)
 		hash_create_memory(aggstate);
+	else if (use_index)
+		index_create_memory(aggstate);
 
 	ExecAssignExprContext(estate, &aggstate->ss.ps);
 
@@ -3501,6 +4305,13 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		aggstate->phases[0].gset_lengths = palloc_array(int, numHashes);
 		aggstate->phases[0].grouped_cols = palloc_array(Bitmapset *, numHashes);
 	}
+	else if (numIndexes)
+	{
+		aggstate->perindex = palloc0(sizeof(AggStatePerIndexData) * numIndexes);
+		aggstate->phases[0].numsets = 0;
+		aggstate->phases[0].gset_lengths = palloc(numIndexes * sizeof(int));
+		aggstate->phases[0].grouped_cols = palloc(numIndexes * sizeof(Bitmapset *));
+	}
 
 	phase = 0;
 	for (phaseidx = 0; phaseidx <= list_length(node->chain); ++phaseidx)
@@ -3513,6 +4324,18 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			aggnode = list_nth_node(Agg, node->chain, phaseidx - 1);
 			sortnode = castNode(Sort, outerPlan(aggnode));
 		}
+		else if (use_index)
+		{
+			Assert(list_length(node->chain) == 1);
+
+			aggnode = node;
+			sortnode = castNode(Sort, linitial(node->chain));
+			/* 
+			 * list contains single element, so we must adjust loop variable,
+			 * so it will be single iteration at all.
+			 */
+			phaseidx++;
+		}
 		else
 		{
 			aggnode = node;
@@ -3549,6 +4372,35 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			all_grouped_cols = bms_add_members(all_grouped_cols, cols);
 			continue;
 		}
+		else if (aggnode->aggstrategy == AGG_INDEX)
+		{
+			AggStatePerPhase phasedata = &aggstate->phases[0];
+			AggStatePerIndex perindex;
+			Bitmapset *cols;
+			
+			Assert(phase == 0);
+			Assert(sortnode);
+
+			i = phasedata->numsets++;
+			
+			/* phase 0 always points to the "real" Agg in the index case */
+			phasedata->aggnode = node;
+			phasedata->aggstrategy = node->aggstrategy;
+			phasedata->sortnode = sortnode;
+
+			perindex = &aggstate->perindex[i];
+			perindex->aggnode = aggnode;
+			aggstate->index_sort = sortnode;
+
+			phasedata->gset_lengths[i] = perindex->numKeyCols = aggnode->numCols;
+
+			cols = NULL;
+			for (j = 0; j < aggnode->numCols; ++j)
+				cols = bms_add_member(cols, aggnode->grpColIdx[j]);
+				
+			phasedata->grouped_cols[i] = cols;
+			all_grouped_cols = bms_add_members(all_grouped_cols, cols);
+		}
 		else
 		{
 			AggStatePerPhase phasedata = &aggstate->phases[++phase];
@@ -3666,7 +4518,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	aggstate->all_pergroups = palloc0_array(AggStatePerGroup, numGroupingSets + numHashes);
 	pergroups = aggstate->all_pergroups;
 
-	if (node->aggstrategy != AGG_HASHED)
+	if (node->aggstrategy != AGG_HASHED && node->aggstrategy != AGG_INDEX)
 	{
 		for (i = 0; i < numGroupingSets; i++)
 		{
@@ -3680,18 +4532,15 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	/*
 	 * Hashing can only appear in the initial phase.
 	 */
-	if (use_hashing)
+	if (use_hashing || use_index)
 	{
 		Plan	   *outerplan = outerPlan(node);
 		double		totalGroups = 0;
 
-		aggstate->hash_spill_rslot = ExecInitExtraTupleSlot(estate, scanDesc,
-															&TTSOpsMinimalTuple);
-		aggstate->hash_spill_wslot = ExecInitExtraTupleSlot(estate, scanDesc,
-															&TTSOpsVirtual);
-
-		/* this is an array of pointers, not structures */
-		aggstate->hash_pergroup = pergroups;
+		aggstate->spill_rslot = ExecInitExtraTupleSlot(estate, scanDesc,
+													   &TTSOpsMinimalTuple);
+		aggstate->spill_wslot = ExecInitExtraTupleSlot(estate, scanDesc,
+													   &TTSOpsVirtual);
 
 		aggstate->hashentrysize = hash_agg_entry_size(aggstate->numtrans,
 													  outerplan->plan_width,
@@ -3706,20 +4555,115 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		for (int k = 0; k < aggstate->num_hashes; k++)
 			totalGroups += aggstate->perhash[k].aggnode->numGroups;
 
-		hash_agg_set_limits(aggstate->hashentrysize, totalGroups, 0,
-							&aggstate->hash_mem_limit,
-							&aggstate->hash_ngroups_limit,
-							&aggstate->hash_planned_partitions);
-		find_hash_columns(aggstate);
+		agg_set_limits(aggstate->hashentrysize, totalGroups, 0,
+					   &aggstate->spill_mem_limit,
+					   &aggstate->spill_ngroups_limit,
+					   &aggstate->spill_planned_partitions);
 
-		/* Skip massive memory allocation if we are just doing EXPLAIN */
-		if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-			build_hash_tables(aggstate);
+		if (use_hashing)
+		{
+			/* this is an array of pointers, not structures */
+			aggstate->hash_pergroup = pergroups;
+	
+			find_hash_columns(aggstate);
+
+			/* Skip massive memory allocation if we are just doing EXPLAIN */
+			if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+				build_hash_tables(aggstate);
+			aggstate->table_filled = false;
+		}
+		else
+		{
+			find_index_columns(aggstate);
+
+			if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+				build_index(aggstate);
+			aggstate->index_filled = false;
+		}
 
-		aggstate->table_filled = false;
 
 		/* Initialize this to 1, meaning nothing spilled, yet */
-		aggstate->hash_batches_used = 1;
+		aggstate->spill_batches_used = 1;
+	}
+
+	/* 
+	 * For index merge disk spill may be required and we perform external
+	 * merge for this purpose. But stored tuples are already projected, so
+	 * have different TupleDesc than used in-memory (inputDesc and indexDesc).
+	 */
+	if (use_index)
+	{
+		AggStatePerIndex perindex = aggstate->perindex;
+		ListCell *lc;
+		List *targetlist = aggstate->ss.ps.plan->targetlist;
+		AttrNumber *attr_mapping_tl = 
+						palloc0(sizeof(AttrNumber) * list_length(targetlist));
+		AttrNumber *keyColIdxResult;
+
+		/* 
+		 * Build grouping column attribute mapping and store it in
+		 * attr_mapping_tl.  If there is no such mapping (projected), then
+		 * InvalidAttrNumber is set, otherwise index in indexDesc column
+		 * storing this attribute.
+		 */
+		foreach (lc, targetlist)
+		{
+			TargetEntry *te = (TargetEntry *)lfirst(lc);
+			Var *group_var;
+
+			/* All grouping expressions in targetlist stored as OUTER Vars */
+			if (!IsA(te->expr, Var))
+				continue;
+			
+			group_var = (Var *)te->expr;
+			if (group_var->varno != OUTER_VAR)
+				continue;
+
+			attr_mapping_tl[foreach_current_index(lc)] = group_var->varattno;
+		}
+
+		/* Mapping is built and now create reverse mapping */
+		keyColIdxResult = palloc0(sizeof(AttrNumber) * list_length(outerPlan(node)->targetlist));
+		for (i = 0; i < list_length(targetlist); ++i)
+		{
+			AttrNumber outer_attno = attr_mapping_tl[i];
+			AttrNumber existingIdx;
+
+			if (!AttributeNumberIsValid(outer_attno))
+				continue;
+
+			existingIdx = keyColIdxResult[outer_attno - 1];
+			
+			/* attnumbers can duplicate, so use first ones */
+			if (AttributeNumberIsValid(existingIdx) && existingIdx <= outer_attno)
+				continue;
+
+			/* 
+			 * column can be referenced in query but planner can decide to
+			 * remove is from grouping.
+			 */
+			if (!bms_is_member(outer_attno, all_grouped_cols))
+				continue;
+
+			keyColIdxResult[outer_attno - 1] = i + 1;
+		}
+
+		perindex->idxKeyColIdxTL = palloc(sizeof(AttrNumber) * perindex->numKeyCols);
+		for (i = 0; i < perindex->numKeyCols; ++i)
+		{
+			AttrNumber attno = keyColIdxResult[perindex->idxKeyColIdxInput[i] - 1];
+			if (!AttributeNumberIsValid(attno))
+				elog(ERROR, "could not locate group by attributes in targetlist for index mapping");
+
+			perindex->idxKeyColIdxTL[i] = attno;
+		}
+
+		pfree(attr_mapping_tl);
+		pfree(keyColIdxResult);
+
+		perindex->mergeslot = ExecInitExtraTupleSlot(estate,
+													 aggstate->ss.ps.ps_ResultTupleDesc, 
+													 &TTSOpsMinimalTuple);
 	}
 
 	/*
@@ -3732,13 +4676,19 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	{
 		aggstate->current_phase = 0;
 		initialize_phase(aggstate, 0);
-		select_current_set(aggstate, 0, true);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_HASH);
+	}
+	else if (node->aggstrategy == AGG_INDEX)
+	{
+		aggstate->current_phase = 0;
+		initialize_phase(aggstate, 0);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_INDEX);
 	}
 	else
 	{
 		aggstate->current_phase = 1;
 		initialize_phase(aggstate, 1);
-		select_current_set(aggstate, 0, false);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_SORT);
 	}
 
 	/*
@@ -4066,8 +5016,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	for (phaseidx = 0; phaseidx < aggstate->numphases; phaseidx++)
 	{
 		AggStatePerPhase phase = &aggstate->phases[phaseidx];
-		bool		dohash = false;
-		bool		dosort = false;
+		int			strategy;
 
 		/* phase 0 doesn't necessarily exist */
 		if (!phase->aggnode)
@@ -4079,8 +5028,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			 * Phase one, and only phase one, in a mixed agg performs both
 			 * sorting and aggregation.
 			 */
-			dohash = true;
-			dosort = true;
+			strategy = GROUPING_STRATEGY_HASH | GROUPING_STRATEGY_SORT;
 		}
 		else if (aggstate->aggstrategy == AGG_MIXED && phaseidx == 0)
 		{
@@ -4094,19 +5042,20 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		else if (phase->aggstrategy == AGG_PLAIN ||
 				 phase->aggstrategy == AGG_SORTED)
 		{
-			dohash = false;
-			dosort = true;
+			strategy = GROUPING_STRATEGY_SORT;
 		}
 		else if (phase->aggstrategy == AGG_HASHED)
 		{
-			dohash = true;
-			dosort = false;
+			strategy = GROUPING_STRATEGY_HASH;
+		}
+		else if (phase->aggstrategy == AGG_INDEX)
+		{
+			strategy = GROUPING_STRATEGY_INDEX;
 		}
 		else
 			Assert(false);
 
-		phase->evaltrans = ExecBuildAggTrans(aggstate, phase, dosort, dohash,
-											 false);
+		phase->evaltrans = ExecBuildAggTrans(aggstate, phase, strategy, false);
 
 		/* cache compiled expression for outer slot without NULL check */
 		phase->evaltrans_cache[0][0] = phase->evaltrans;
@@ -4409,9 +5358,9 @@ ExecEndAgg(AggState *node)
 
 		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
 		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
-		si->hash_batches_used = node->hash_batches_used;
-		si->hash_disk_used = node->hash_disk_used;
-		si->hash_mem_peak = node->hash_mem_peak;
+		si->hash_batches_used = node->spill_batches_used;
+		si->hash_disk_used = node->spill_disk_used;
+		si->hash_mem_peak = node->spill_mem_peak;
 	}
 
 	/* Make sure we have closed any open tuplesorts */
@@ -4421,7 +5370,10 @@ ExecEndAgg(AggState *node)
 	if (node->sort_out)
 		tuplesort_end(node->sort_out);
 
-	hashagg_reset_spill_state(node);
+	if (node->aggstrategy == AGG_INDEX)
+		indexagg_reset_spill_state(node);
+	else
+		hashagg_reset_spill_state(node);
 
 	/* Release hash tables too */
 	if (node->hash_metacxt != NULL)
@@ -4434,6 +5386,26 @@ ExecEndAgg(AggState *node)
 		MemoryContextDelete(node->hash_tuplescxt);
 		node->hash_tuplescxt = NULL;
 	}
+	if (node->index_metacxt != NULL)
+	{
+		MemoryContextDelete(node->index_metacxt);
+		node->index_metacxt = NULL;
+	}
+	if (node->index_entrycxt != NULL)
+	{
+		MemoryContextDelete(node->index_entrycxt);
+		node->index_entrycxt = NULL;
+	}
+	if (node->index_nodecxt != NULL)
+	{
+		MemoryContextDelete(node->index_nodecxt);
+		node->index_nodecxt = NULL;
+	}
+	if (node->mergestate)
+	{
+		tuplesort_end(node->mergestate);
+		node->mergestate = NULL;
+	}
 
 	for (transno = 0; transno < node->numtrans; transno++)
 	{
@@ -4451,6 +5423,8 @@ ExecEndAgg(AggState *node)
 		ReScanExprContext(node->aggcontexts[setno]);
 	if (node->hashcontext)
 		ReScanExprContext(node->hashcontext);
+	if (node->indexcontext)
+		ReScanExprContext(node->indexcontext);
 
 	outerPlan = outerPlanState(node);
 	ExecEndNode(outerPlan);
@@ -4486,12 +5460,27 @@ ExecReScanAgg(AggState *node)
 		 * we can just rescan the existing hash table; no need to build it
 		 * again.
 		 */
-		if (outerPlan->chgParam == NULL && !node->hash_ever_spilled &&
+		if (outerPlan->chgParam == NULL && !node->spill_ever_happened &&
 			!bms_overlap(node->ss.ps.chgParam, aggnode->aggParams))
 		{
 			ResetTupleHashIterator(node->perhash[0].hashtable,
 								   &node->perhash[0].hashiter);
-			select_current_set(node, 0, true);
+			select_current_set(node, 0, GROUPING_STRATEGY_HASH);
+			return;
+		}
+	}
+
+	if (node->aggstrategy == AGG_INDEX)
+	{
+		if (!node->index_filled)
+			return;
+
+		if (outerPlan->chgParam == NULL && !node->spill_ever_happened &&
+			!bms_overlap(node->ss.ps.chgParam, aggnode->aggParams))
+		{
+			AggStatePerIndex perindex = node->perindex;
+			ResetTupleIndexIterator(perindex->index, &perindex->iter);
+			select_current_set(node, 0, GROUPING_STRATEGY_INDEX);
 			return;
 		}
 	}
@@ -4545,9 +5534,9 @@ ExecReScanAgg(AggState *node)
 	{
 		hashagg_reset_spill_state(node);
 
-		node->hash_ever_spilled = false;
-		node->hash_spill_mode = false;
-		node->hash_ngroups_current = 0;
+		node->spill_ever_happened = false;
+		node->spill_mode = false;
+		node->spill_ngroups_current = 0;
 
 		ReScanExprContext(node->hashcontext);
 		/* Rebuild empty hash table(s) */
@@ -4555,10 +5544,33 @@ ExecReScanAgg(AggState *node)
 		node->table_filled = false;
 		/* iterator will be reset when the table is filled */
 
-		hashagg_recompile_expressions(node, false, false);
+		agg_recompile_expressions(node, false, false);
 	}
 
-	if (node->aggstrategy != AGG_HASHED)
+	if (node->aggstrategy == AGG_INDEX)
+	{
+		indexagg_reset_spill_state(node);
+
+		node->spill_ever_happened = false;
+		node->spill_mode = false;
+		node->spill_ngroups_current = 0;
+		
+		ReScanExprContext(node->indexcontext);
+		MemoryContextReset(node->index_entrycxt);
+		MemoryContextReset(node->index_nodecxt);
+
+		build_index(node);
+		node->index_filled = false;
+
+		agg_recompile_expressions(node, false, false);
+
+		if (node->mergestate)
+		{
+			tuplesort_end(node->mergestate);
+			node->mergestate = NULL;
+		}
+	}
+	else if (node->aggstrategy != AGG_HASHED)
 	{
 		/*
 		 * Reset the per-group state (in particular, mark transvalues null)
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 88ae529e843..fc349707778 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1900,6 +1900,7 @@ static void
 inittapestate(Tuplesortstate *state, int maxTapes)
 {
 	int64		tapeSpace;
+	Size		memtuplesSize;
 
 	/*
 	 * Decrease availMem to reflect the space needed for tape buffers; but
@@ -1912,7 +1913,16 @@ inittapestate(Tuplesortstate *state, int maxTapes)
 	 */
 	tapeSpace = (int64) maxTapes * TAPE_BUFFER_OVERHEAD;
 
-	if (tapeSpace + GetMemoryChunkSpace(state->memtuples) < state->allowedMem)
+	/* 
+	 * In merge state during initial run creation we do not use in-memory
+	 * tuples array and write to tapes directly.
+	 */
+	if (state->memtuples != NULL)
+		memtuplesSize = GetMemoryChunkSpace(state->memtuples);
+	else
+		memtuplesSize = 0;
+
+	if (tapeSpace + memtuplesSize < state->allowedMem)
 		USEMEM(state, tapeSpace);
 
 	/*
@@ -2031,11 +2041,14 @@ mergeruns(Tuplesortstate *state)
 
 	/*
 	 * We no longer need a large memtuples array.  (We will allocate a smaller
-	 * one for the heap later.)
+	 * one for the heap later.)  Note that in merge state this array can be NULL.
 	 */
-	FREEMEM(state, GetMemoryChunkSpace(state->memtuples));
-	pfree(state->memtuples);
-	state->memtuples = NULL;
+	if (state->memtuples)
+	{
+		FREEMEM(state, GetMemoryChunkSpace(state->memtuples));
+		pfree(state->memtuples);
+		state->memtuples = NULL;
+	}
 
 	/*
 	 * Initialize the slab allocator.  We need one slab slot per input tape,
@@ -3157,3 +3170,189 @@ ssup_datum_int32_cmp(Datum x, Datum y, SortSupport ssup)
 	else
 		return 0;
 }
+
+/* 
+ *    tuplemerge_begin_common
+ * 
+ * Create new Tuplesortstate for performing merge only. This is used when
+ * we know, that input is sorted, but stored in multiple tapes, so only
+ * have to perform merge.
+ * 
+ * Unlike tuplesort_begin_common it does not accept sortopt, because none
+ * of current options are supported by merge (random access and bounded sort).
+ */
+Tuplesortstate *
+tuplemerge_begin_common(int workMem, SortCoordinate coordinate)
+{
+	Tuplesortstate *state;
+	MemoryContext maincontext;
+	MemoryContext sortcontext;
+	MemoryContext oldcontext;
+
+	/*
+	 * Memory context surviving tuplesort_reset.  This memory context holds
+	 * data which is useful to keep while sorting multiple similar batches.
+	 */
+	maincontext = AllocSetContextCreate(CurrentMemoryContext,
+										"TupleMerge main",
+										ALLOCSET_DEFAULT_SIZES);
+
+	/*
+	 * Create a working memory context for one sort operation.  The content of
+	 * this context is deleted by tuplesort_reset.
+	 */
+	sortcontext = AllocSetContextCreate(maincontext,
+										"TupleMerge merge",
+										ALLOCSET_DEFAULT_SIZES);
+
+	/*
+	 * Make the Tuplesortstate within the per-sortstate context.  This way, we
+	 * don't need a separate pfree() operation for it at shutdown.
+	 */
+	oldcontext = MemoryContextSwitchTo(maincontext);
+
+	state = (Tuplesortstate *) palloc0(sizeof(Tuplesortstate));
+
+	if (trace_sort)
+		pg_rusage_init(&state->ru_start);
+
+	state->base.sortopt = TUPLESORT_NONE;
+	state->base.tuples = true;
+	state->abbrevNext = 10;
+
+	/*
+	 * workMem is forced to be at least 64KB, the current minimum valid value
+	 * for the work_mem GUC.  This is a defense against parallel sort callers
+	 * that divide out memory among many workers in a way that leaves each
+	 * with very little memory.
+	 */
+	state->allowedMem = Max(workMem, 64) * (int64) 1024;
+	state->base.sortcontext = sortcontext;
+	state->base.maincontext = maincontext;
+
+	/*
+	 * After all of the other non-parallel-related state, we setup all of the
+	 * state needed for each batch.
+	 */
+
+	/* 
+	 * Merging do not accept RANDOMACCESS, so only possible context is Bump,
+	 * which saves some cycles.
+	 */
+	state->base.tuplecontext = BumpContextCreate(state->base.sortcontext,
+												 "Caller tuples",
+												 ALLOCSET_DEFAULT_SIZES);
+	
+	state->status = TSS_BUILDRUNS;
+	state->bounded = false;
+	state->boundUsed = false;
+	state->availMem = state->allowedMem;
+	
+	/* 
+	 * When performing merge we do not need in-memory array for sorting.
+	 * Even if we do not use memtuples, still allocate it, but make it empty.
+	 * So if someone will invoke inappropriate function in merge mode we will
+	 * not fail.
+	 */
+	state->memtuples = NULL;
+	state->memtupcount = 0;
+	state->memtupsize = INITIAL_MEMTUPSIZE;
+	state->growmemtuples = true;
+	state->slabAllocatorUsed = false;
+
+	/*
+	 * Tape variables (inputTapes, outputTapes, etc.) will be initialized by
+	 * inittapes(), if needed.
+	 */
+	state->result_tape = NULL;	/* flag that result tape has not been formed */
+	state->tapeset = NULL;
+	
+	inittapes(state, true);
+
+	/*
+	 * Initialize parallel-related state based on coordination information
+	 * from caller
+	 */
+	if (!coordinate)
+	{
+		/* Serial sort */
+		state->shared = NULL;
+		state->worker = -1;
+		state->nParticipants = -1;
+	}
+	else if (coordinate->isWorker)
+	{
+		/* Parallel worker produces exactly one final run from all input */
+		state->shared = coordinate->sharedsort;
+		state->worker = worker_get_identifier(state);
+		state->nParticipants = -1;
+	}
+	else
+	{
+		/* Parallel leader state only used for final merge */
+		state->shared = coordinate->sharedsort;
+		state->worker = -1;
+		state->nParticipants = coordinate->nParticipants;
+		Assert(state->nParticipants >= 1);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+void
+tuplemerge_start_run(Tuplesortstate *state)
+{
+	if (state->memtupcount == 0)
+		return;
+
+	selectnewtape(state);
+	state->memtupcount = 0;
+}
+
+void
+tuplemerge_performmerge(Tuplesortstate *state)
+{
+	if (state->memtupcount == 0)
+	{
+		/* 
+		 * We have started new run, but no tuples were written. mergeruns
+		 * expects that each run have at least 1 tuple, otherwise it
+		 * will fail to even fill initial merge heap.
+		 */
+		state->nOutputRuns--;
+	}
+	else
+		state->memtupcount = 0;
+
+	mergeruns(state);
+
+	state->current = 0;
+	state->eof_reached = false;
+	state->markpos_block = 0L;
+	state->markpos_offset = 0;
+	state->markpos_eof = false;
+}
+
+void
+tuplemerge_puttuple_common(Tuplesortstate *state, SortTuple *tuple, Size tuplen)
+{
+	MemoryContext oldcxt = MemoryContextSwitchTo(state->base.sortcontext);
+
+	Assert(state->destTape);	
+	WRITETUP(state, state->destTape, tuple);
+
+	MemoryContextSwitchTo(oldcxt);
+	
+	state->memtupcount++;
+}
+
+void
+tuplemerge_end_run(Tuplesortstate *state)
+{
+	if (state->memtupcount != 0)
+	{
+		markrunend(state->destTape);
+	}
+}
diff --git a/src/backend/utils/sort/tuplesortvariants.c b/src/backend/utils/sort/tuplesortvariants.c
index 079a51c474d..5f8afa8a17a 100644
--- a/src/backend/utils/sort/tuplesortvariants.c
+++ b/src/backend/utils/sort/tuplesortvariants.c
@@ -2071,3 +2071,108 @@ readtup_datum(Tuplesortstate *state, SortTuple *stup,
 	if (base->sortopt & TUPLESORT_RANDOMACCESS) /* need trailing length word? */
 		LogicalTapeReadExact(tape, &tuplen, sizeof(tuplen));
 }
+
+Tuplesortstate *
+tuplemerge_begin_heap(TupleDesc tupDesc,
+					  int nkeys, AttrNumber *attNums,
+					  Oid *sortOperators, Oid *sortCollations,
+					  bool *nullsFirstFlags,
+					  int workMem, SortCoordinate coordinate)
+{
+	Tuplesortstate *state = tuplemerge_begin_common(workMem, coordinate);
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	MemoryContext oldcontext;
+	int			i;
+
+	oldcontext = MemoryContextSwitchTo(base->maincontext);
+
+	Assert(nkeys > 0);
+
+	if (trace_sort)
+		elog(LOG,
+			 "begin tuple merge: nkeys = %d, workMem = %d", nkeys, workMem);
+
+	base->nKeys = nkeys;
+
+	TRACE_POSTGRESQL_SORT_START(HEAP_SORT,
+								false,	/* no unique check */
+								nkeys,
+								workMem,
+								false,
+								PARALLEL_SORT(coordinate));
+
+	base->removeabbrev = removeabbrev_heap;
+	base->comparetup = comparetup_heap;
+	base->comparetup_tiebreak = comparetup_heap_tiebreak;
+	base->writetup = writetup_heap;
+	base->readtup = readtup_heap;
+	base->haveDatum1 = true;
+	base->arg = tupDesc;		/* assume we need not copy tupDesc */
+
+	/* Prepare SortSupport data for each column */
+	base->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		SortSupport sortKey = base->sortKeys + i;
+
+		Assert(attNums[i] != 0);
+		Assert(sortOperators[i] != 0);
+
+		sortKey->ssup_cxt = CurrentMemoryContext;
+		sortKey->ssup_collation = sortCollations[i];
+		sortKey->ssup_nulls_first = nullsFirstFlags[i];
+		sortKey->ssup_attno = attNums[i];
+		/* Convey if abbreviation optimization is applicable in principle */
+		sortKey->abbreviate = (i == 0 && base->haveDatum1);
+
+		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
+	}
+
+	/*
+	 * The "onlyKey" optimization cannot be used with abbreviated keys, since
+	 * tie-breaker comparisons may be required.  Typically, the optimization
+	 * is only of value to pass-by-value types anyway, whereas abbreviated
+	 * keys are typically only of value to pass-by-reference types.
+	 */
+	if (nkeys == 1 && !base->sortKeys->abbrev_converter)
+		base->onlyKey = base->sortKeys;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+void
+tuplemerge_puttupleslot(Tuplesortstate *state, TupleTableSlot *slot)
+{
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	MemoryContext oldcontext = MemoryContextSwitchTo(base->tuplecontext);
+	TupleDesc	tupDesc = (TupleDesc) base->arg;
+	SortTuple	stup;
+	MinimalTuple tuple;
+	HeapTupleData htup;
+	Size		tuplen;
+
+	/* copy the tuple into sort storage */
+	tuple = ExecCopySlotMinimalTuple(slot);
+	stup.tuple = tuple;
+	/* set up first-column key value */
+	htup.t_len = tuple->t_len + MINIMAL_TUPLE_OFFSET;
+	htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET);
+	stup.datum1 = heap_getattr(&htup,
+							   base->sortKeys[0].ssup_attno,
+							   tupDesc,
+							   &stup.isnull1);
+
+	/* GetMemoryChunkSpace is not supported for bump contexts */
+	if (TupleSortUseBumpTupleCxt(base->sortopt))
+		tuplen = MAXALIGN(tuple->t_len);
+	else
+		tuplen = GetMemoryChunkSpace(tuple);
+
+	tuplemerge_puttuple_common(state, &stup, tuplen);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 6192cc8d143..7c9efe77ab9 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -393,8 +393,16 @@ extern ExprState *ExecInitExprWithParams(Expr *node, ParamListInfo ext_params);
 extern ExprState *ExecInitQual(List *qual, PlanState *parent);
 extern ExprState *ExecInitCheck(List *qual, PlanState *parent);
 extern List *ExecInitExprList(List *nodes, PlanState *parent);
+
+/* 
+ * Which strategy to use for aggregation/grouping
+ */
+#define GROUPING_STRATEGY_SORT			1
+#define GROUPING_STRATEGY_HASH			(1 << 1)
+#define GROUPING_STRATEGY_INDEX			(1 << 2)
+
 extern ExprState *ExecBuildAggTrans(AggState *aggstate, struct AggStatePerPhaseData *phase,
-									bool doSort, bool doHash, bool nullcheck);
+									int groupStrategy, bool nullcheck);
 extern ExprState *ExecBuildHash32FromAttrs(TupleDesc desc,
 										   const TupleTableSlotOps *ops,
 										   FmgrInfo *hashfunctions,
diff --git a/src/include/executor/nodeAgg.h b/src/include/executor/nodeAgg.h
index 6c4891bbaeb..8361d000878 100644
--- a/src/include/executor/nodeAgg.h
+++ b/src/include/executor/nodeAgg.h
@@ -321,6 +321,33 @@ typedef struct AggStatePerHashData
 	Agg		   *aggnode;		/* original Agg node, for numGroups etc. */
 }			AggStatePerHashData;
 
+/* 
+ * AggStatePerIndexData - per-index state
+ *
+ * Logic is the same as for AggStatePerHashData - one of these for each
+ * grouping set.
+ */
+typedef struct AggStatePerIndexData
+{
+	TupleIndex	index;			/* current in-memory index data */
+	MemoryContext metacxt;		/* memory context containing TupleIndex */
+	MemoryContext tempctx;		/* short-lived context */
+	TupleTableSlot *indexslot; 	/* slot for loading index */
+	int			numCols;		/* total number of columns in index tuple */
+	int			numKeyCols;		/* number of key columns in index tuple */
+	int			largestGrpColIdx;	/* largest col required for comparison */
+	AttrNumber *idxKeyColIdxInput;	/* key column indices in input slot */
+	AttrNumber *idxKeyColIdxIndex;	/* key column indices in index tuples */
+	TupleIndexIteratorData iter;	/* iterator state for index */
+	Agg		   *aggnode;		/* original Agg node, for numGroups etc. */	
+
+	/* state used only for spill mode */
+	AttrNumber	*idxKeyColIdxTL;	/* key column indices in target list */
+	FmgrInfo    *hashfunctions;	/* tuple hashing function */
+	ExprState   *indexhashexpr;	/* ExprState for hashing index datatype(s) */
+	ExprContext *exprcontext;	/* expression context */
+	TupleTableSlot *mergeslot;	/* slot for loading tuple during merge */
+}			AggStatePerIndexData;
 
 extern AggState *ExecInitAgg(Agg *node, EState *estate, int eflags);
 extern void ExecEndAgg(AggState *node);
@@ -328,9 +355,9 @@ extern void ExecReScanAgg(AggState *node);
 
 extern Size hash_agg_entry_size(int numTrans, Size tupleWidth,
 								Size transitionSpace);
-extern void hash_agg_set_limits(double hashentrysize, double input_groups,
-								int used_bits, Size *mem_limit,
-								uint64 *ngroups_limit, int *num_partitions);
+extern void agg_set_limits(double hashentrysize, double input_groups,
+						   int used_bits, Size *mem_limit,
+						   uint64 *ngroups_limit, int *num_partitions);
 
 /* parallel instrumentation support */
 extern void ExecAggEstimate(AggState *node, ParallelContext *pcxt);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 99ee472b51f..3bba2359e11 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2613,6 +2613,7 @@ typedef struct AggStatePerTransData *AggStatePerTrans;
 typedef struct AggStatePerGroupData *AggStatePerGroup;
 typedef struct AggStatePerPhaseData *AggStatePerPhase;
 typedef struct AggStatePerHashData *AggStatePerHash;
+typedef struct AggStatePerIndexData *AggStatePerIndex;
 
 typedef struct AggState
 {
@@ -2628,17 +2629,18 @@ typedef struct AggState
 	AggStatePerAgg peragg;		/* per-Aggref information */
 	AggStatePerTrans pertrans;	/* per-Trans state information */
 	ExprContext *hashcontext;	/* econtexts for long-lived data (hashtable) */
+	ExprContext *indexcontext;	/* econtexts for long-lived data (index) */
 	ExprContext **aggcontexts;	/* econtexts for long-lived data (per GS) */
 	ExprContext *tmpcontext;	/* econtext for input expressions */
-#define FIELDNO_AGGSTATE_CURAGGCONTEXT 14
+#define FIELDNO_AGGSTATE_CURAGGCONTEXT 15
 	ExprContext *curaggcontext; /* currently active aggcontext */
 	AggStatePerAgg curperagg;	/* currently active aggregate, if any */
-#define FIELDNO_AGGSTATE_CURPERTRANS 16
+#define FIELDNO_AGGSTATE_CURPERTRANS 17
 	AggStatePerTrans curpertrans;	/* currently active trans state, if any */
 	bool		input_done;		/* indicates end of input */
 	bool		agg_done;		/* indicates completion of Agg scan */
 	int			projected_set;	/* The last projected grouping set */
-#define FIELDNO_AGGSTATE_CURRENT_SET 20
+#define FIELDNO_AGGSTATE_CURRENT_SET 21
 	int			current_set;	/* The current grouping set being evaluated */
 	Bitmapset  *grouped_cols;	/* grouped cols in current projection */
 	List	   *all_grouped_cols;	/* list of all grouped cols in DESC order */
@@ -2660,32 +2662,43 @@ typedef struct AggState
 	int			num_hashes;
 	MemoryContext hash_metacxt; /* memory for hash table bucket array */
 	MemoryContext hash_tuplescxt;	/* memory for hash table tuples */
-	struct LogicalTapeSet *hash_tapeset;	/* tape set for hash spill tapes */
-	struct HashAggSpill *hash_spills;	/* HashAggSpill for each grouping set,
-										 * exists only during first pass */
-	TupleTableSlot *hash_spill_rslot;	/* for reading spill files */
-	TupleTableSlot *hash_spill_wslot;	/* for writing spill files */
-	List	   *hash_batches;	/* hash batches remaining to be processed */
-	bool		hash_ever_spilled;	/* ever spilled during this execution? */
-	bool		hash_spill_mode;	/* we hit a limit during the current batch
-									 * and we must not create new groups */
-	Size		hash_mem_limit; /* limit before spilling hash table */
-	uint64		hash_ngroups_limit; /* limit before spilling hash table */
-	int			hash_planned_partitions;	/* number of partitions planned
-											 * for first pass */
-	double		hashentrysize;	/* estimate revised during execution */
-	Size		hash_mem_peak;	/* peak hash table memory usage */
-	uint64		hash_ngroups_current;	/* number of groups currently in
-										 * memory in all hash tables */
-	uint64		hash_disk_used; /* kB of disk space used */
-	int			hash_batches_used;	/* batches used during entire execution */
-
 	AggStatePerHash perhash;	/* array of per-hashtable data */
 	AggStatePerGroup *hash_pergroup;	/* grouping set indexed array of
 										 * per-group pointers */
+	/* Fields used for managing spill mode in hash and index aggs */
+	struct LogicalTapeSet *spill_tapeset;	/* tape set for hash spill tapes */
+	struct HashAggSpill *spills;	/* HashAggSpill for each grouping set,
+									 * exists only during first pass */
+	TupleTableSlot *spill_rslot;	/* for reading spill files */
+	TupleTableSlot *spill_wslot;	/* for writing spill files */
+	List	   *spill_batches;	/* hash batches remaining to be processed */
+
+	bool		spill_ever_happened;	/* ever spilled during this execution? */
+	bool		spill_mode;	/* we hit a limit during the current batch
+							 * and we must not create new groups */
+	Size		spill_mem_limit; /* limit before spilling hash table or index */
+	uint64		spill_ngroups_limit; /* limit before spilling hash table or index */
+	int			spill_planned_partitions;	/* number of partitions planned
+											 * for first pass */
+	double		hashentrysize;	/* estimate revised during execution */
+	Size		spill_mem_peak;	/* peak memory usage of hash table or index */
+	uint64		spill_ngroups_current;	/* number of groups currently in
+										 * memory in all hash tables */
+	uint64		spill_disk_used; /* kB of disk space used */
+	int			spill_batches_used;	/* batches used during entire execution */
+
+	/* these fields are used in AGG_INDEXED mode: */
+	AggStatePerIndex perindex;	/* pointer to per-index state data */
+	bool			index_filled;	/* index filled yet? */
+	MemoryContext	index_metacxt;	/* memory for index structure */
+	MemoryContext	index_nodecxt;	/* memory for index nodes */
+	MemoryContext	index_entrycxt;	/* memory for index entries */
+	Sort		   *index_sort;		/* ordering information for index */
+	Tuplesortstate *mergestate;		/* state for merging projected tuples if
+									 * spill occurred */
 
 	/* support for evaluation of agg input expressions: */
-#define FIELDNO_AGGSTATE_ALL_PERGROUPS 54
+#define FIELDNO_AGGSTATE_ALL_PERGROUPS 62
 	AggStatePerGroup *all_pergroups;	/* array of first ->pergroups, than
 										 * ->hash_pergroup */
 	SharedAggInfo *shared_info; /* one entry per worker */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fb3957e75e5..b0e2d781c01 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -365,6 +365,7 @@ typedef enum AggStrategy
 	AGG_SORTED,					/* grouped agg, input must be sorted */
 	AGG_HASHED,					/* grouped agg, use internal hashtable */
 	AGG_MIXED,					/* grouped agg, hash and sort both used */
+	AGG_INDEX,					/* grouped agg, build index for input */
 } AggStrategy;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..b19dacf5de4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1219,7 +1219,7 @@ typedef struct Agg
 	/* grouping sets to use */
 	List	   *groupingSets;
 
-	/* chained Agg/Sort nodes */
+	/* chained Agg/Sort nodes, for AGG_INDEX contains single Sort node */
 	List	   *chain;
 } Agg;
 
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 0bf55902aa1..f372c3e7e0a 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -475,6 +475,21 @@ extern GinTuple *tuplesort_getgintuple(Tuplesortstate *state, Size *len,
 									   bool forward);
 extern bool tuplesort_getdatum(Tuplesortstate *state, bool forward, bool copy,
 							   Datum *val, bool *isNull, Datum *abbrev);
-
+/* 
+* Special state for merge mode.
+*/
+extern Tuplesortstate *tuplemerge_begin_common(int workMem,
+											   SortCoordinate coordinate);
+extern Tuplesortstate *tuplemerge_begin_heap(TupleDesc tupDesc,
+											int nkeys, AttrNumber *attNums,
+											Oid *sortOperators, Oid *sortCollations,
+											bool *nullsFirstFlags,
+											int workMem, SortCoordinate coordinate);
+extern void tuplemerge_start_run(Tuplesortstate *state);
+extern void tuplemerge_end_run(Tuplesortstate *state);
+extern void tuplemerge_puttuple_common(Tuplesortstate *state, SortTuple *tuple,
+									   Size tuplen);
+extern void tuplemerge_puttupleslot(Tuplesortstate *state, TupleTableSlot *slot);
+extern void tuplemerge_performmerge(Tuplesortstate *state);
 
 #endif							/* TUPLESORT_H */
-- 
2.43.0

v2-0003-make-use-of-IndexAggregate-in-planner-and-explain.patchtext/x-patch; charset=UTF-8; name=v2-0003-make-use-of-IndexAggregate-in-planner-and-explain.patchDownload

From bece0df3261a889a452f7b0eb1e85b58b19df9ab Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 17:34:18 +0300
Subject: [PATCH v2 3/4] make use of IndexAggregate in planner and explain

This commit adds usage of IndexAggregate in planner and explain (analyze).

We calculate cost of IndexAggregate and add AGG_INDEX node to the pathlist.
Cost of this node is cost of building B+tree (in memory), disk spill and
final external merge.

For EXPLAIN there is only little change - show sort information in "Group Key".
---
 src/backend/commands/explain.c                | 101 ++++++++++++---
 src/backend/optimizer/path/costsize.c         | 121 ++++++++++++------
 src/backend/optimizer/plan/createplan.c       |  15 ++-
 src/backend/optimizer/plan/planner.c          |  35 +++++
 src/backend/optimizer/util/pathnode.c         |   9 ++
 src/backend/utils/misc/guc_parameters.dat     |   7 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h                 |   3 +-
 src/include/optimizer/cost.h                  |   1 +
 9 files changed, 237 insertions(+), 56 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5a6390631eb..9e16c547b06 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -134,7 +134,7 @@ static void show_recursive_union_info(RecursiveUnionState *rstate,
 									  ExplainState *es);
 static void show_memoize_info(MemoizeState *mstate, List *ancestors,
 							  ExplainState *es);
-static void show_hashagg_info(AggState *aggstate, ExplainState *es);
+static void show_agg_spill_info(AggState *aggstate, ExplainState *es);
 static void show_indexsearches_info(PlanState *planstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1556,6 +1556,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 						pname = "MixedAggregate";
 						strategy = "Mixed";
 						break;
+					case AGG_INDEX:
+						pname = "IndexAggregate";
+						strategy = "Indexed";
+						break;
 					default:
 						pname = "Aggregate ???";
 						strategy = "???";
@@ -2200,7 +2204,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Agg:
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
-			show_hashagg_info((AggState *) planstate, es);
+			show_agg_spill_info((AggState *) planstate, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
@@ -2631,6 +2635,24 @@ show_agg_keys(AggState *astate, List *ancestors,
 
 		if (plan->groupingSets)
 			show_grouping_sets(outerPlanState(astate), plan, ancestors, es);
+		else if (plan->aggstrategy == AGG_INDEX)
+			{
+				Sort	*sort = astate->index_sort;
+
+				/* 
+				 * Index Agg reorders GROUP BY keys to match ORDER BY
+				 * so they must be the same, but we should show other
+				 * useful information about used ordering, such as direction.
+				 */
+				Assert(sort != NULL);
+				show_sort_group_keys(outerPlanState(astate), "Group Key",
+									 plan->numCols, 0,
+									 sort->sortColIdx,
+									 sort->sortOperators,
+									 sort->collations,
+									 sort->nullsFirst,
+									 ancestors, es);
+			}
 		else
 			show_sort_group_keys(outerPlanState(astate), "Group Key",
 								 plan->numCols, 0, plan->grpColIdx,
@@ -3735,47 +3757,67 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es)
 }
 
 /*
- * Show information on hash aggregate memory usage and batches.
+ * Show information on hash or index aggregate memory usage and batches.
  */
 static void
-show_hashagg_info(AggState *aggstate, ExplainState *es)
+show_agg_spill_info(AggState *aggstate, ExplainState *es)
 {
 	Agg		   *agg = (Agg *) aggstate->ss.ps.plan;
-	int64		memPeakKb = BYTES_TO_KILOBYTES(aggstate->hash_mem_peak);
+	int64		memPeakKb = BYTES_TO_KILOBYTES(aggstate->spill_mem_peak);
 
 	if (agg->aggstrategy != AGG_HASHED &&
-		agg->aggstrategy != AGG_MIXED)
+		agg->aggstrategy != AGG_MIXED &&
+		agg->aggstrategy != AGG_INDEX)
 		return;
 
 	if (es->format != EXPLAIN_FORMAT_TEXT)
 	{
 		if (es->costs)
 			ExplainPropertyInteger("Planned Partitions", NULL,
-								   aggstate->hash_planned_partitions, es);
+								   aggstate->spill_planned_partitions, es);
 
 		/*
 		 * During parallel query the leader may have not helped out.  We
 		 * detect this by checking how much memory it used.  If we find it
 		 * didn't do any work then we don't show its properties.
 		 */
-		if (es->analyze && aggstate->hash_mem_peak > 0)
+		if (es->analyze && aggstate->spill_mem_peak > 0)
 		{
 			ExplainPropertyInteger("HashAgg Batches", NULL,
-								   aggstate->hash_batches_used, es);
+								   aggstate->spill_batches_used, es);
 			ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
 			ExplainPropertyInteger("Disk Usage", "kB",
-								   aggstate->hash_disk_used, es);
+								   aggstate->spill_disk_used, es);
+		}
+
+		if (   es->analyze
+			&& aggstate->aggstrategy == AGG_INDEX
+			&& aggstate->mergestate != NULL)
+		{
+			TuplesortInstrumentation stats;
+			const char *mergeMethod;
+			const char *spaceType;
+			int64 spaceUsed;
+			
+			tuplesort_get_stats(aggstate->mergestate, &stats);
+			mergeMethod = tuplesort_method_name(stats.sortMethod);
+			spaceType = tuplesort_space_type_name(stats.spaceType);
+			spaceUsed = stats.spaceUsed;
+
+			ExplainPropertyText("Merge Method", mergeMethod, es);
+			ExplainPropertyInteger("Merge Space Used", "kB", spaceUsed, es);
+			ExplainPropertyText("Merge Space Type", spaceType, es);
 		}
 	}
 	else
 	{
 		bool		gotone = false;
 
-		if (es->costs && aggstate->hash_planned_partitions > 0)
+		if (es->costs && aggstate->spill_planned_partitions > 0)
 		{
 			ExplainIndentText(es);
 			appendStringInfo(es->str, "Planned Partitions: %d",
-							 aggstate->hash_planned_partitions);
+							 aggstate->spill_planned_partitions);
 			gotone = true;
 		}
 
@@ -3784,7 +3826,7 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 		 * detect this by checking how much memory it used.  If we find it
 		 * didn't do any work then we don't show its properties.
 		 */
-		if (es->analyze && aggstate->hash_mem_peak > 0)
+		if (es->analyze && aggstate->spill_mem_peak > 0)
 		{
 			if (!gotone)
 				ExplainIndentText(es);
@@ -3792,17 +3834,44 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 				appendStringInfoSpaces(es->str, 2);
 
 			appendStringInfo(es->str, "Batches: %d  Memory Usage: " INT64_FORMAT "kB",
-							 aggstate->hash_batches_used, memPeakKb);
+							 aggstate->spill_batches_used, memPeakKb);
 			gotone = true;
 
 			/* Only display disk usage if we spilled to disk */
-			if (aggstate->hash_batches_used > 1)
+			if (aggstate->spill_batches_used > 1)
 			{
 				appendStringInfo(es->str, "  Disk Usage: " UINT64_FORMAT "kB",
-								 aggstate->hash_disk_used);
+								 aggstate->spill_disk_used);
 			}
 		}
 
+		/* For index aggregate show stats for final merging */
+		if (   es->analyze
+			&& aggstate->aggstrategy == AGG_INDEX
+			&& aggstate->mergestate != NULL)
+		{
+			TuplesortInstrumentation stats;
+			const char *mergeMethod;
+			const char *spaceType;
+			int64 spaceUsed;
+			
+			tuplesort_get_stats(aggstate->mergestate, &stats);
+			mergeMethod = tuplesort_method_name(stats.sortMethod);
+			spaceType = tuplesort_space_type_name(stats.spaceType);
+			spaceUsed = stats.spaceUsed;
+
+			/* 
+			 * If we are here that means that previous check (for mem peak) was
+			 * successfull (can not directly go to merge without any in-memory
+			 * operations).  Do not check other state and just start a new line.
+			 */
+			appendStringInfoChar(es->str, '\n');
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Merge Method: %s  %s: " INT64_FORMAT "kB",
+							 mergeMethod, spaceType, spaceUsed);
+			gotone = true;
+		}
+
 		if (gotone)
 			appendStringInfoChar(es->str, '\n');
 	}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a39cc793b4d..a0af4e76f42 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -150,6 +150,7 @@ bool		enable_tidscan = true;
 bool		enable_sort = true;
 bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
+bool		enable_indexagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
 bool		enable_memoize = true;
@@ -1848,6 +1849,32 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
 									rterm->pathtarget->width);
 }
 
+/* 
+ * cost_tuplemerge
+ *		Determines and returns the cost of external merge used in tuplesort.
+ */
+static void
+cost_tuplemerge(double availMem, double input_bytes, double ntuples,
+				Cost comparison_cost, Cost *cost)
+{
+	double		npages = ceil(input_bytes / BLCKSZ);
+	double		nruns = input_bytes / availMem;
+	double		mergeorder = tuplesort_merge_order(availMem);
+	double		log_runs;
+	double		npageaccesses;
+
+	/* Compute logM(r) as log(r) / log(M) */
+	if (nruns > mergeorder)
+		log_runs = ceil(log(nruns) / log(mergeorder));
+	else
+		log_runs = 1.0;
+
+	npageaccesses = 2.0 * npages * log_runs;
+
+	/* Assume 3/4ths of accesses are sequential, 1/4th are not */
+	*cost += npageaccesses * (seq_page_cost * 0.75 + random_page_cost * 0.25);
+}
+
 /*
  * cost_tuplesort
  *	  Determines and returns the cost of sorting a relation using tuplesort,
@@ -1922,11 +1949,6 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		/*
 		 * We'll have to use a disk-based sort of all the tuples
 		 */
-		double		npages = ceil(input_bytes / BLCKSZ);
-		double		nruns = input_bytes / sort_mem_bytes;
-		double		mergeorder = tuplesort_merge_order(sort_mem_bytes);
-		double		log_runs;
-		double		npageaccesses;
 
 		/*
 		 * CPU costs
@@ -1936,16 +1958,8 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		*startup_cost = comparison_cost * tuples * LOG2(tuples);
 
 		/* Disk costs */
-
-		/* Compute logM(r) as log(r) / log(M) */
-		if (nruns > mergeorder)
-			log_runs = ceil(log(nruns) / log(mergeorder));
-		else
-			log_runs = 1.0;
-		npageaccesses = 2.0 * npages * log_runs;
-		/* Assume 3/4ths of accesses are sequential, 1/4th are not */
-		*startup_cost += npageaccesses *
-			(seq_page_cost * 0.75 + random_page_cost * 0.25);
+		cost_tuplemerge(sort_mem_bytes, input_bytes, tuples, comparison_cost,
+						startup_cost);
 	}
 	else if (tuples > 2 * output_tuples || input_bytes > sort_mem_bytes)
 	{
@@ -2770,7 +2784,7 @@ cost_agg(Path *path, PlannerInfo *root,
 		total_cost += cpu_tuple_cost * numGroups;
 		output_tuples = numGroups;
 	}
-	else
+	else if (aggstrategy == AGG_HASHED)
 	{
 		/* must be AGG_HASHED */
 		startup_cost = input_total_cost;
@@ -2788,6 +2802,46 @@ cost_agg(Path *path, PlannerInfo *root,
 		total_cost += cpu_tuple_cost * numGroups;
 		output_tuples = numGroups;
 	}
+	else
+	{
+		/* must be AGG_INDEX */
+		startup_cost = input_total_cost;
+		if (!enable_indexagg)
+			++disabled_nodes;
+
+		startup_cost += aggcosts->transCost.startup;
+		startup_cost += aggcosts->transCost.per_tuple * input_tuples;
+		/* cost of btree comparison */
+		startup_cost += input_tuples * (2.0 * cpu_operator_cost * numGroupCols);
+		startup_cost += aggcosts->finalCost.startup;
+
+		total_cost = startup_cost;
+		total_cost += aggcosts->finalCost.per_tuple * numGroups;
+		/* cost of retrieving from index */
+		total_cost += cpu_tuple_cost * numGroups;
+		output_tuples = numGroups;
+	}
+
+	/*
+	 * If there are quals (HAVING quals), account for their cost and
+	 * selectivity.  Process it before disk spill logic, because output
+	 * cardinality is required for AGG_INDEX.
+	 */
+	if (quals)
+	{
+		QualCost	qual_cost;
+
+		cost_qual_eval(&qual_cost, quals, root);
+		startup_cost += qual_cost.startup;
+		total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple;
+
+		output_tuples = clamp_row_est(output_tuples *
+									  clauselist_selectivity(root,
+															 quals,
+															 0,
+															 JOIN_INNER,
+															 NULL));
+	}
 
 	/*
 	 * Add the disk costs of hash aggregation that spills to disk.
@@ -2802,7 +2856,7 @@ cost_agg(Path *path, PlannerInfo *root,
 	 * Accrue writes (spilled tuples) to startup_cost and to total_cost;
 	 * accrue reads only to total_cost.
 	 */
-	if (aggstrategy == AGG_HASHED || aggstrategy == AGG_MIXED)
+	if (aggstrategy == AGG_HASHED || aggstrategy == AGG_MIXED || aggstrategy == AGG_INDEX)
 	{
 		double		pages;
 		double		pages_written = 0.0;
@@ -2823,8 +2877,8 @@ cost_agg(Path *path, PlannerInfo *root,
 		hashentrysize = hash_agg_entry_size(list_length(root->aggtransinfos),
 											input_width,
 											aggcosts->transitionSpace);
-		hash_agg_set_limits(hashentrysize, numGroups, 0, &mem_limit,
-							&ngroups_limit, &num_partitions);
+		agg_set_limits(hashentrysize, numGroups, 0, &mem_limit,
+					   &ngroups_limit, &num_partitions);
 
 		nbatches = Max((numGroups * hashentrysize) / mem_limit,
 					   numGroups / ngroups_limit);
@@ -2861,26 +2915,21 @@ cost_agg(Path *path, PlannerInfo *root,
 		spill_cost = depth * input_tuples * 2.0 * cpu_tuple_cost;
 		startup_cost += spill_cost;
 		total_cost += spill_cost;
-	}
 
-	/*
-	 * If there are quals (HAVING quals), account for their cost and
-	 * selectivity.
-	 */
-	if (quals)
-	{
-		QualCost	qual_cost;
+		/* IndexAgg requires final external merge stage */
+		if (aggstrategy == AGG_INDEX)
+		{
+			double	output_bytes;
+			Cost	comparison_cost;
 
-		cost_qual_eval(&qual_cost, quals, root);
-		startup_cost += qual_cost.startup;
-		total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple;
+			/* size of all projected tuples */
+			output_bytes = path->pathtarget->width * output_tuples;
+			/* default comparison cost */
+			comparison_cost = 2.0 * cpu_operator_cost;
 
-		output_tuples = clamp_row_est(output_tuples *
-									  clauselist_selectivity(root,
-															 quals,
-															 0,
-															 JOIN_INNER,
-															 NULL));
+			cost_tuplemerge(work_mem, output_bytes, output_tuples,
+							comparison_cost, &startup_cost);
+		}
 	}
 
 	path->rows = output_tuples;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bc417f93840..de9bb1ef30b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2158,6 +2158,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
 	Plan	   *subplan;
 	List	   *tlist;
 	List	   *quals;
+	List	   *chain;
+	AttrNumber *grpColIdx;
 
 	/*
 	 * Agg can project, so no need to be terribly picky about child tlist, but
@@ -2169,17 +2171,24 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
 
 	quals = order_qual_clauses(root, best_path->qual);
 
+	grpColIdx = extract_grouping_cols(best_path->groupClause, subplan->targetlist);
+
+	/* For index aggregation we should consider the desired sorting order. */
+	if (best_path->aggstrategy == AGG_INDEX)
+		chain = list_make1(make_sort_from_groupcols(best_path->groupClause, grpColIdx, subplan));
+	else
+		chain = NIL;
+
 	plan = make_agg(tlist, quals,
 					best_path->aggstrategy,
 					best_path->aggsplit,
 					list_length(best_path->groupClause),
-					extract_grouping_cols(best_path->groupClause,
-										  subplan->targetlist),
+					grpColIdx,
 					extract_grouping_ops(best_path->groupClause),
 					extract_grouping_collations(best_path->groupClause,
 												subplan->targetlist),
 					NIL,
-					NIL,
+					chain,
 					best_path->numGroups,
 					best_path->transitionSpace,
 					subplan);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8b22c30559b..cfd2f3ff3a9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3877,6 +3877,21 @@ create_grouping_paths(PlannerInfo *root,
 			 (gd ? gd->any_hashable : grouping_is_hashable(root->processed_groupClause))))
 			flags |= GROUPING_CAN_USE_HASH;
 
+		/* 
+		 * Determine whether we should consider index-based implementation of
+		 * grouping.
+		 * 
+		 * This is more restrictive since it not only must be sortable (for
+		 * purposes of Btree), but also must be hashable, so we can effectively
+		 * spill tuples and later process each batch.
+		 */
+		if (   gd == NULL
+			&& root->numOrderedAggs == 0
+			&& parse->groupClause != NIL
+			&& grouping_is_sortable(root->processed_groupClause)
+			&& grouping_is_hashable(root->processed_groupClause))
+			flags |= GROUPING_CAN_USE_INDEX;
+
 		/*
 		 * Determine whether partial aggregation is possible.
 		 */
@@ -7108,6 +7123,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+	bool		can_index = (extra->flags & GROUPING_CAN_USE_INDEX) != 0;
 	List	   *havingQual = (List *) extra->havingQual;
 	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 	double		dNumGroups = 0;
@@ -7329,6 +7345,25 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		}
 	}
 
+	if (can_index)
+	{
+		/* 
+		 * Generate IndexAgg path.
+		 */
+		Assert(!parse->groupingSets);
+		add_path(grouped_rel, (Path *)
+				 create_agg_path(root,
+								 grouped_rel,
+								 cheapest_path,
+								 grouped_rel->reltarget,
+								 AGG_INDEX,
+								 AGGSPLIT_SIMPLE,
+								 root->processed_groupClause,
+								 havingQual,
+								 agg_costs,
+								 dNumGroups));
+	}
+
 	/*
 	 * When partitionwise aggregate is used, we might have fully aggregated
 	 * paths in the partial pathlist, because add_paths_to_append_rel() will
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b6be4ddbd01..2bac26055a7 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3030,6 +3030,15 @@ create_agg_path(PlannerInfo *root,
 		else
 			pathnode->path.pathkeys = subpath->pathkeys;	/* preserves order */
 	}
+	else if (aggstrategy == AGG_INDEX)
+	{
+		/* 
+		 * When using index aggregation all grouping columns will be used as
+		 * comparator keys, so output is always sorted.
+		 */
+		pathnode->path.pathkeys = make_pathkeys_for_sortclauses(root, groupClause,
+																root->processed_tlist);
+	}
 	else
 		pathnode->path.pathkeys = NIL;	/* output is unordered */
 
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 3b9d8349078..776ccd9e2fd 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -868,6 +868,13 @@
   boot_val => 'true',
 },
 
+{ name => 'enable_indexagg', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+  short_desc => 'Enables the planner\'s use of index aggregation plans.',
+  flags => 'GUC_EXPLAIN',
+  variable => 'enable_indexagg',
+  boot_val => 'true',
+},
+
 { name => 'enable_indexonlyscan', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
   short_desc => 'Enables the planner\'s use of index-only-scan plans.',
   flags => 'GUC_EXPLAIN',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index dc9e2255f8a..307b9ee660d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -410,6 +410,7 @@
 #enable_hashagg = on
 #enable_hashjoin = on
 #enable_incremental_sort = on
+#enable_indexagg = on
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 46a8655621d..f4b2d35b1d9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -3518,7 +3518,8 @@ typedef struct JoinPathExtraData
  */
 #define GROUPING_CAN_USE_SORT       0x0001
 #define GROUPING_CAN_USE_HASH       0x0002
-#define GROUPING_CAN_PARTIAL_AGG	0x0004
+#define GROUPING_CAN_USE_INDEX		0x0004
+#define GROUPING_CAN_PARTIAL_AGG	0x0008
 
 /*
  * What kind of partitionwise aggregation is in use?
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index b523bcda8f3..5d03b5971bd 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
 extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
+extern PGDLLIMPORT bool enable_indexagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
 extern PGDLLIMPORT bool enable_memoize;
-- 
2.43.0

v2-0004-fix-tests-for-IndexAggregate.patchtext/x-patch; charset=UTF-8; name=v2-0004-fix-tests-for-IndexAggregate.patchDownload

From f30da61435d89cdd6e1b3e2ba1a9fa047f6785c2 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 10 Dec 2025 17:09:42 +0300
Subject: [PATCH v2 4/4] fix tests for IndexAggregate

After adding IndexAggregate node some test output changed and tests
broke. This patch updates expected output.

Also it adds some IndexAggregate specific tests into aggregates.sql and
partition_aggregate.sql.
---
 .../postgres_fdw/expected/postgres_fdw.out    |  56 ++--
 src/test/regress/expected/aggregates.out      | 299 +++++++++++++++++-
 src/test/regress/expected/eager_aggregate.out |  99 ++++++
 src/test/regress/expected/groupingsets.out    |  38 +--
 src/test/regress/expected/join.out            |   8 +-
 src/test/regress/expected/matview.out         |  14 +-
 .../regress/expected/partition_aggregate.out  | 227 ++++++++++---
 src/test/regress/expected/partition_join.out  | 179 +++++------
 src/test/regress/expected/select_parallel.out |  35 +-
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/aggregates.sql           | 147 ++++++++-
 src/test/regress/sql/eager_aggregate.sql      |  41 +++
 src/test/regress/sql/partition_aggregate.sql  |  32 +-
 src/test/regress/sql/select_parallel.sql      |   3 +
 14 files changed, 932 insertions(+), 249 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 48e3185b227..d1cb3e8802c 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2324,29 +2324,26 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = t2.
 -- Aggregate after UNION, for testing setrefs
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) UNION SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) AS t (t1c1, t2c1) GROUP BY t1c1 ORDER BY t1c1 OFFSET 100 LIMIT 10;
-                                                                     QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                  QUERY PLAN                                                                  
+----------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, (avg((t1.c1 + t2.c1)))
-   ->  Sort
-         Output: t1.c1, (avg((t1.c1 + t2.c1)))
-         Sort Key: t1.c1
+   ->  IndexAggregate
+         Output: t1.c1, avg((t1.c1 + t2.c1))
+         Group Key: t1.c1
          ->  HashAggregate
-               Output: t1.c1, avg((t1.c1 + t2.c1))
-               Group Key: t1.c1
-               ->  HashAggregate
-                     Output: t1.c1, t2.c1
-                     Group Key: t1.c1, t2.c1
-                     ->  Append
-                           ->  Foreign Scan
-                                 Output: t1.c1, t2.c1
-                                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
-                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r1."C 1"))))
-                           ->  Foreign Scan
-                                 Output: t1_1.c1, t2_1.c1
-                                 Relations: (public.ft1 t1_1) INNER JOIN (public.ft2 t2_1)
-                                 Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r1."C 1"))))
-(20 rows)
+               Output: t1.c1, t2.c1
+               Group Key: t1.c1, t2.c1
+               ->  Append
+                     ->  Foreign Scan
+                           Output: t1.c1, t2.c1
+                           Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+                           Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r1."C 1"))))
+                     ->  Foreign Scan
+                           Output: t1_1.c1, t2_1.c1
+                           Relations: (public.ft1 t1_1) INNER JOIN (public.ft2 t2_1)
+                           Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r1."C 1"))))
+(17 rows)
 
 SELECT t1c1, avg(t1c1 + t2c1) FROM (SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) UNION SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) AS t (t1c1, t2c1) GROUP BY t1c1 ORDER BY t1c1 OFFSET 100 LIMIT 10;
  t1c1 |         avg          
@@ -3057,18 +3054,15 @@ select c2 * (random() <= 1)::int as sum1, sum(c1) * c2 as sum2 from ft1 group by
 -- Aggregate with unshippable GROUP BY clause are not pushed
 explain (verbose, costs off)
 select c2 * (random() <= 1)::int as c2 from ft2 group by c2 * (random() <= 1)::int order by 1;
-                                  QUERY PLAN                                  
-------------------------------------------------------------------------------
- Sort
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ IndexAggregate
    Output: ((c2 * ((random() <= '1'::double precision))::integer))
-   Sort Key: ((ft2.c2 * ((random() <= '1'::double precision))::integer))
-   ->  HashAggregate
-         Output: ((c2 * ((random() <= '1'::double precision))::integer))
-         Group Key: (ft2.c2 * ((random() <= '1'::double precision))::integer)
-         ->  Foreign Scan on public.ft2
-               Output: (c2 * ((random() <= '1'::double precision))::integer)
-               Remote SQL: SELECT c2 FROM "S 1"."T 1"
-(9 rows)
+   Group Key: (ft2.c2 * ((random() <= '1'::double precision))::integer)
+   ->  Foreign Scan on public.ft2
+         Output: (c2 * ((random() <= '1'::double precision))::integer)
+         Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(6 rows)
 
 -- GROUP BY clause in various forms, cardinal, alias and constant expression
 explain (verbose, costs off)
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index cae8e7bca31..c33eaa0c0ec 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1714,12 +1714,10 @@ EXPLAIN (COSTS OFF) SELECT a, COUNT(a) OVER (PARTITION BY a) FROM t1 GROUP BY AL
 ----------------------------------
  WindowAgg
    Window: w1 AS (PARTITION BY a)
-   ->  Sort
-         Sort Key: a
-         ->  HashAggregate
-               Group Key: a
-               ->  Seq Scan on t1
-(7 rows)
+   ->  IndexAggregate
+         Group Key: a
+         ->  Seq Scan on t1
+(5 rows)
 
 -- all cols
 EXPLAIN (COSTS OFF) SELECT *, count(*) FROM t1 GROUP BY ALL;
@@ -3270,6 +3268,7 @@ FROM generate_series(1, 100) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x, y);
 ANALYZE btg;
 SET enable_hashagg = off;
+SET enable_indexagg = off;
 SET enable_seqscan = off;
 -- Utilize the ordering of index scan to avoid a Sort operation
 EXPLAIN (COSTS OFF)
@@ -3707,10 +3706,242 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
  ba       |    0 |     1
 (2 rows)
 
+ 
+--
+-- Index Aggregation tests
+--
+set enable_hashagg = false;
+set enable_sort = false;
+set enable_indexagg = true;
+set enable_indexscan = false;
+-- require ordered output
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: unique1, (sum(two))
+   ->  IndexAggregate
+         Output: unique1, sum(two)
+         Group Key: tenk1.unique1
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+ unique1 | sum 
+---------+-----
+       0 |   0
+       1 |   1
+       2 |   0
+       3 |   1
+       4 |   0
+       5 |   1
+       6 |   0
+       7 |   1
+       8 |   0
+       9 |   1
+(10 rows)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: even, (sum(two))
+   ->  IndexAggregate
+         Output: even, sum(two)
+         Group Key: tenk1.even
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+ even | sum 
+------+-----
+    1 |   0
+    3 | 100
+    5 |   0
+    7 | 100
+    9 |   0
+   11 | 100
+   13 |   0
+   15 | 100
+   17 |   0
+   19 | 100
+(10 rows)
+
+-- multiple grouping columns
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: even, odd, (sum(unique1))
+   ->  IndexAggregate
+         Output: even, odd, sum(unique1)
+         Group Key: tenk1.even, tenk1.odd
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+ even | odd |  sum   
+------+-----+--------
+    1 |   0 | 495000
+    3 |   2 | 495100
+    5 |   4 | 495200
+    7 |   6 | 495300
+    9 |   8 | 495400
+   11 |  10 | 495500
+   13 |  12 | 495600
+   15 |  14 | 495700
+   17 |  16 | 495800
+   19 |  18 | 495900
+(10 rows)
+
+-- mixing columns between group by and order by
+begin;
+create temp table tmp(x int, y int);
+insert into tmp values (1, 8), (2, 7), (3, 6), (4, 5);
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+          QUERY PLAN           
+-------------------------------
+ IndexAggregate
+   Output: x, y, sum(x)
+   Group Key: tmp.x, tmp.y
+   ->  Seq Scan on pg_temp.tmp
+         Output: x, y
+(5 rows)
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+ x | y | sum 
+---+---+-----
+ 1 | 8 |   1
+ 2 | 7 |   2
+ 3 | 6 |   3
+ 4 | 5 |   4
+(4 rows)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+          QUERY PLAN           
+-------------------------------
+ IndexAggregate
+   Output: x, y, sum(x)
+   Group Key: tmp.y, tmp.x
+   ->  Seq Scan on pg_temp.tmp
+         Output: x, y
+(5 rows)
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+ x | y | sum 
+---+---+-----
+ 4 | 5 |   4
+ 3 | 6 |   3
+ 2 | 7 |   2
+ 1 | 8 |   1
+(4 rows)
+
+--
+-- Index Aggregation Spill tests
+--
+set enable_indexagg = true;
+set enable_sort=false;
+set enable_hashagg = false;
+set work_mem='64kB';
+select unique1, count(*), sum(twothousand) from tenk1
+group by unique1
+having sum(fivethous) > 4975
+order by sum(twothousand);
+ unique1 | count | sum  
+---------+-------+------
+    4976 |     1 |  976
+    4977 |     1 |  977
+    4978 |     1 |  978
+    4979 |     1 |  979
+    4980 |     1 |  980
+    4981 |     1 |  981
+    4982 |     1 |  982
+    4983 |     1 |  983
+    4984 |     1 |  984
+    4985 |     1 |  985
+    4986 |     1 |  986
+    4987 |     1 |  987
+    4988 |     1 |  988
+    4989 |     1 |  989
+    4990 |     1 |  990
+    4991 |     1 |  991
+    4992 |     1 |  992
+    4993 |     1 |  993
+    4994 |     1 |  994
+    4995 |     1 |  995
+    4996 |     1 |  996
+    4997 |     1 |  997
+    4998 |     1 |  998
+    4999 |     1 |  999
+    9976 |     1 | 1976
+    9977 |     1 | 1977
+    9978 |     1 | 1978
+    9979 |     1 | 1979
+    9980 |     1 | 1980
+    9981 |     1 | 1981
+    9982 |     1 | 1982
+    9983 |     1 | 1983
+    9984 |     1 | 1984
+    9985 |     1 | 1985
+    9986 |     1 | 1986
+    9987 |     1 | 1987
+    9988 |     1 | 1988
+    9989 |     1 | 1989
+    9990 |     1 | 1990
+    9991 |     1 | 1991
+    9992 |     1 | 1992
+    9993 |     1 | 1993
+    9994 |     1 | 1994
+    9995 |     1 | 1995
+    9996 |     1 | 1996
+    9997 |     1 | 1997
+    9998 |     1 | 1998
+    9999 |     1 | 1999
+(48 rows)
+
+set work_mem to default;
+set enable_sort to default;
+set enable_hashagg to default;
+set enable_indexagg to default;
 --
 -- Hash Aggregation Spill tests
 --
 set enable_sort=false;
+set enable_indexagg = false;
 set work_mem='64kB';
 select unique1, count(*), sum(twothousand) from tenk1
 group by unique1
@@ -3783,6 +4014,7 @@ select g from generate_series(0, 19999) g;
 analyze agg_data_20k;
 -- Produce results with sorting.
 set enable_hashagg = false;
+set enable_indexagg = false;
 set jit_above_cost = 0;
 explain (costs off)
 select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
@@ -3852,31 +4084,74 @@ select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
   from agg_data_2k group by g/2;
 set enable_sort = true;
 set work_mem to default;
+-- Produce results with index aggregation
+set enable_sort = false;
+set enable_hashagg = false;
+set enable_indexagg = true;
+set jit_above_cost = 0;
+explain (costs off)
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+           QUERY PLAN           
+--------------------------------
+ IndexAggregate
+   Group Key: (g % 10000)
+   ->  Seq Scan on agg_data_20k
+(3 rows)
+
+create table agg_index_1 as
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+create table agg_index_2 as
+select * from
+  (values (100), (300), (500)) as r(a),
+  lateral (
+    select (g/2)::numeric as c1,
+           array_agg(g::numeric) as c2,
+	   count(*) as c3
+    from agg_data_2k
+    where g < r.a
+    group by g/2) as s;
+set jit_above_cost to default;
+create table agg_index_3 as
+select (g/2)::numeric as c1, sum(7::int4) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+create table agg_index_4 as
+select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
 -- Compare group aggregation results to hash aggregation results
 (select * from agg_hash_1 except select * from agg_group_1)
   union all
-(select * from agg_group_1 except select * from agg_hash_1);
+(select * from agg_group_1 except select * from agg_hash_1)
+  union all
+(select * from agg_index_1 except select * from agg_group_1);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
 
 (select * from agg_hash_2 except select * from agg_group_2)
   union all
-(select * from agg_group_2 except select * from agg_hash_2);
+(select * from agg_group_2 except select * from agg_hash_2)
+  union all
+(select * from agg_index_2 except select * from agg_group_2);
  a | c1 | c2 | c3 
 ---+----+----+----
 (0 rows)
 
 (select * from agg_hash_3 except select * from agg_group_3)
   union all
-(select * from agg_group_3 except select * from agg_hash_3);
+(select * from agg_group_3 except select * from agg_hash_3)
+  union all
+(select * from agg_index_3 except select * from agg_group_3);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
 
 (select * from agg_hash_4 except select * from agg_group_4)
   union all
-(select * from agg_group_4 except select * from agg_hash_4);
+(select * from agg_group_4 except select * from agg_hash_4)
+  union all
+(select * from agg_index_4 except select * from agg_group_4);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
@@ -3889,3 +4164,7 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_index_1;
+drop table agg_index_2;
+drop table agg_index_3;
+drop table agg_index_4;
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
index 5ac966186f7..247915da8f6 100644
--- a/src/test/regress/expected/eager_aggregate.out
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -62,6 +62,7 @@ GROUP BY t1.a ORDER BY t1.a;
 
 -- Produce results with sorting aggregation
 SET enable_hashagg TO off;
+SET enable_indexagg TO off;
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
@@ -110,6 +111,50 @@ GROUP BY t1.a ORDER BY t1.a;
 (9 rows)
 
 RESET enable_hashagg;
+RESET enable_indexagg;
+-- Produce results with index aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+                      QUERY PLAN                      
+------------------------------------------------------
+ IndexAggregate
+   Output: t1.a, avg(t2.c)
+   Group Key: t1.a
+   ->  Hash Join
+         Output: t1.a, t2.c
+         Hash Cond: (t2.b = t1.b)
+         ->  Seq Scan on public.eager_agg_t2 t2
+               Output: t2.a, t2.b, t2.c
+         ->  Hash
+               Output: t1.a, t1.b
+               ->  Seq Scan on public.eager_agg_t1 t1
+                     Output: t1.a, t1.b
+(12 rows)
+
+SELECT t1.a, avg(t2.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg 
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+RESET enable_sort;
 --
 -- Test eager aggregation over join rel
 --
@@ -170,6 +215,7 @@ GROUP BY t1.a ORDER BY t1.a;
 
 -- Produce results with sorting aggregation
 SET enable_hashagg TO off;
+SET enable_indexagg TO off;
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a, avg(t2.c + t3.c)
   FROM eager_agg_t1 t1
@@ -227,6 +273,59 @@ GROUP BY t1.a ORDER BY t1.a;
 (9 rows)
 
 RESET enable_hashagg;
+RESET enable_indexagg;
+-- Produce results with index aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+  JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ IndexAggregate
+   Output: t1.a, avg((t2.c + t3.c))
+   Group Key: t1.a
+   ->  Hash Join
+         Output: t1.a, t2.c, t3.c
+         Hash Cond: (t2.b = t1.b)
+         ->  Hash Join
+               Output: t2.c, t2.b, t3.c
+               Hash Cond: (t3.a = t2.a)
+               ->  Seq Scan on public.eager_agg_t3 t3
+                     Output: t3.a, t3.b, t3.c
+               ->  Hash
+                     Output: t2.c, t2.b, t2.a
+                     ->  Seq Scan on public.eager_agg_t2 t2
+                           Output: t2.c, t2.b, t2.a
+         ->  Hash
+               Output: t1.a, t1.b
+               ->  Seq Scan on public.eager_agg_t1 t1
+                     Output: t1.a, t1.b
+(19 rows)
+
+SELECT t1.a, avg(t2.c + t3.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+  JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg 
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+RESET enable_sort;
 --
 -- Test that eager aggregation works for outer join
 --
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 39d35a195bc..46b80db6806 100644
--- a/src/test/regress/expected/groupingsets.out
+++ b/src/test/regress/expected/groupingsets.out
@@ -506,18 +506,15 @@ cross join lateral (select (select i1.q1) as x) ss
 group by ss.x;
                         QUERY PLAN                        
 ----------------------------------------------------------
- GroupAggregate
+ IndexAggregate
    Output: GROUPING((SubPlan expr_1)), ((SubPlan expr_2))
-   Group Key: ((SubPlan expr_2))
-   ->  Sort
-         Output: ((SubPlan expr_2)), i1.q1
-         Sort Key: ((SubPlan expr_2))
-         ->  Seq Scan on public.int8_tbl i1
-               Output: (SubPlan expr_2), i1.q1
-               SubPlan expr_2
-                 ->  Result
-                       Output: i1.q1
-(11 rows)
+   Group Key: (SubPlan expr_2)
+   ->  Seq Scan on public.int8_tbl i1
+         Output: (SubPlan expr_2), i1.q1
+         SubPlan expr_2
+           ->  Result
+                 Output: i1.q1
+(8 rows)
 
 select grouping(ss.x)
 from int8_tbl i1
@@ -536,21 +533,18 @@ cross join lateral (select (select i1.q1) as x) ss
 group by ss.x;
                    QUERY PLAN                   
 ------------------------------------------------
- GroupAggregate
+ IndexAggregate
    Output: (SubPlan expr_1), ((SubPlan expr_3))
-   Group Key: ((SubPlan expr_3))
-   ->  Sort
-         Output: ((SubPlan expr_3)), i1.q1
-         Sort Key: ((SubPlan expr_3))
-         ->  Seq Scan on public.int8_tbl i1
-               Output: (SubPlan expr_3), i1.q1
-               SubPlan expr_3
-                 ->  Result
-                       Output: i1.q1
+   Group Key: (SubPlan expr_3)
+   ->  Seq Scan on public.int8_tbl i1
+         Output: (SubPlan expr_3), i1.q1
+         SubPlan expr_3
+           ->  Result
+                 Output: i1.q1
    SubPlan expr_1
      ->  Result
            Output: GROUPING((SubPlan expr_2))
-(14 rows)
+(11 rows)
 
 select (select grouping(ss.x))
 from int8_tbl i1
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index edde9e99893..a6e11fb64e7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6211,13 +6211,13 @@ select d.* from d left join (select distinct * from b) s
 explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
-                QUERY PLAN                
-------------------------------------------
+         QUERY PLAN         
+----------------------------
  Merge Right Join
    Merge Cond: (b.id = d.a)
-   ->  Group
+   ->  IndexAggregate
          Group Key: b.id
-         ->  Index Scan using b_pkey on b
+         ->  Seq Scan on b
    ->  Sort
          Sort Key: d.a
          ->  Seq Scan on d
diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out
index 0355720dfc6..8c939c8f397 100644
--- a/src/test/regress/expected/matview.out
+++ b/src/test/regress/expected/matview.out
@@ -55,14 +55,12 @@ SELECT * FROM mvtest_tm ORDER BY type;
 -- create various views
 EXPLAIN (costs off)
   CREATE MATERIALIZED VIEW mvtest_tvm AS SELECT * FROM mvtest_tv ORDER BY type;
-            QUERY PLAN            
-----------------------------------
- Sort
-   Sort Key: mvtest_t.type
-   ->  HashAggregate
-         Group Key: mvtest_t.type
-         ->  Seq Scan on mvtest_t
-(5 rows)
+         QUERY PLAN         
+----------------------------
+ IndexAggregate
+   Group Key: mvtest_t.type
+   ->  Seq Scan on mvtest_t
+(3 rows)
 
 CREATE MATERIALIZED VIEW mvtest_tvm AS SELECT * FROM mvtest_tv ORDER BY type;
 SELECT * FROM mvtest_tvm;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index c30304b99c7..9e29fe17f45 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -177,8 +177,9 @@ SELECT c, sum(a) FROM pagg_tab WHERE c = 'x' GROUP BY c;
 ---+-----
 (0 rows)
 
--- Test GroupAggregate paths by disabling hash aggregates.
+-- Test GroupAggregate paths by disabling hash and index aggregates.
 SET enable_hashagg TO false;
+SET enable_indexagg TO false;
 -- When GROUP BY clause matches full aggregation is performed for each partition.
 EXPLAIN (COSTS OFF)
 SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
@@ -370,6 +371,136 @@ SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;
    250
 (1 row)
 
+RESET enable_hashagg;
+RESET enable_indexagg;
+-- Test IndexAggregate paths by disabling hash and group aggregates.
+SET enable_sort TO false;
+SET enable_hashagg TO false;
+-- When GROUP BY clause matches full aggregation is performed for each partition.
+EXPLAIN (COSTS OFF)
+SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Sort
+   Disabled: true
+   Sort Key: pagg_tab.c, (sum(pagg_tab.a)), (avg(pagg_tab.b))
+   ->  Append
+         ->  IndexAggregate
+               Group Key: pagg_tab.c
+               Filter: (avg(pagg_tab.d) < '15'::numeric)
+               ->  Seq Scan on pagg_tab_p1 pagg_tab
+         ->  IndexAggregate
+               Group Key: pagg_tab_1.c
+               Filter: (avg(pagg_tab_1.d) < '15'::numeric)
+               ->  Seq Scan on pagg_tab_p2 pagg_tab_1
+         ->  IndexAggregate
+               Group Key: pagg_tab_2.c
+               Filter: (avg(pagg_tab_2.d) < '15'::numeric)
+               ->  Seq Scan on pagg_tab_p3 pagg_tab_2
+(16 rows)
+
+SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+  c   | sum  |         avg         | count 
+------+------+---------------------+-------
+ 0000 | 2000 | 12.0000000000000000 |   250
+ 0001 | 2250 | 13.0000000000000000 |   250
+ 0002 | 2500 | 14.0000000000000000 |   250
+ 0006 | 2500 | 12.0000000000000000 |   250
+ 0007 | 2750 | 13.0000000000000000 |   250
+ 0008 | 2000 | 14.0000000000000000 |   250
+(6 rows)
+
+-- When GROUP BY clause does not match; full aggregation by top node.
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Sort
+   Disabled: true
+   Sort Key: pagg_tab.a, (sum(pagg_tab.b)), (avg(pagg_tab.b))
+   ->  IndexAggregate
+         Group Key: pagg_tab.a
+         Filter: (avg(pagg_tab.d) < '15'::numeric)
+         ->  Append
+               ->  Seq Scan on pagg_tab_p1 pagg_tab_1
+               ->  Seq Scan on pagg_tab_p2 pagg_tab_2
+               ->  Seq Scan on pagg_tab_p3 pagg_tab_3
+(10 rows)
+
+SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+ a  | sum  |         avg         | count 
+----+------+---------------------+-------
+  0 | 1500 | 10.0000000000000000 |   150
+  1 | 1650 | 11.0000000000000000 |   150
+  2 | 1800 | 12.0000000000000000 |   150
+  3 | 1950 | 13.0000000000000000 |   150
+  4 | 2100 | 14.0000000000000000 |   150
+ 10 | 1500 | 10.0000000000000000 |   150
+ 11 | 1650 | 11.0000000000000000 |   150
+ 12 | 1800 | 12.0000000000000000 |   150
+ 13 | 1950 | 13.0000000000000000 |   150
+ 14 | 2100 | 14.0000000000000000 |   150
+(10 rows)
+
+-- Test partitionwise grouping without any aggregates
+EXPLAIN (COSTS OFF)
+SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
+                   QUERY PLAN                   
+------------------------------------------------
+ Merge Append
+   Sort Key: pagg_tab.c
+   ->  IndexAggregate
+         Group Key: pagg_tab.c
+         ->  Seq Scan on pagg_tab_p1 pagg_tab
+   ->  IndexAggregate
+         Group Key: pagg_tab_1.c
+         ->  Seq Scan on pagg_tab_p2 pagg_tab_1
+   ->  IndexAggregate
+         Group Key: pagg_tab_2.c
+         ->  Seq Scan on pagg_tab_p3 pagg_tab_2
+(11 rows)
+
+SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
+  c   
+------
+ 0000
+ 0001
+ 0002
+ 0003
+ 0004
+ 0005
+ 0006
+ 0007
+ 0008
+ 0009
+ 0010
+ 0011
+(12 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
+                   QUERY PLAN                   
+------------------------------------------------
+ IndexAggregate
+   Group Key: pagg_tab.a
+   ->  Append
+         ->  Seq Scan on pagg_tab_p1 pagg_tab_1
+               Filter: (a < 3)
+         ->  Seq Scan on pagg_tab_p2 pagg_tab_2
+               Filter: (a < 3)
+         ->  Seq Scan on pagg_tab_p3 pagg_tab_3
+               Filter: (a < 3)
+(9 rows)
+
+SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
+ a 
+---
+ 0
+ 1
+ 2
+(3 rows)
+
+RESET enable_sort;
 RESET enable_hashagg;
 -- ROLLUP, partitionwise aggregation does not apply
 EXPLAIN (COSTS OFF)
@@ -554,6 +685,7 @@ SELECT t2.y, sum(t1.y), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2
 -- When GROUP BY clause does not match; partial aggregation is performed for each partition.
 -- Also test GroupAggregate paths by disabling hash aggregates.
 SET enable_hashagg TO false;
+SET enable_indexagg TO false;
 EXPLAIN (COSTS OFF)
 SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y HAVING avg(t1.x) > 10 ORDER BY 1, 2, 3;
                                QUERY PLAN                                
@@ -606,6 +738,7 @@ SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2
 (6 rows)
 
 RESET enable_hashagg;
+RESET enable_indexagg;
 -- Check with LEFT/RIGHT/FULL OUTER JOINs which produces NULL values for
 -- aggregation
 -- LEFT JOIN, should produce partial partitionwise aggregation plan as
@@ -761,27 +894,25 @@ SELECT a.x, sum(b.x) FROM pagg_tab1 a FULL OUTER JOIN pagg_tab2 b ON a.x = b.y G
 -- But right now we are unable to do partitionwise join in this case.
 EXPLAIN (COSTS OFF)
 SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a LEFT JOIN (SELECT * FROM pagg_tab2 WHERE y > 10) b ON a.x = b.y WHERE a.x > 5 or b.y < 20  GROUP BY a.x, b.y ORDER BY 1, 2;
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Sort
-   Sort Key: pagg_tab1.x, pagg_tab2.y
-   ->  HashAggregate
-         Group Key: pagg_tab1.x, pagg_tab2.y
-         ->  Hash Left Join
-               Hash Cond: (pagg_tab1.x = pagg_tab2.y)
-               Filter: ((pagg_tab1.x > 5) OR (pagg_tab2.y < 20))
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ IndexAggregate
+   Group Key: pagg_tab1.x, pagg_tab2.y
+   ->  Hash Left Join
+         Hash Cond: (pagg_tab1.x = pagg_tab2.y)
+         Filter: ((pagg_tab1.x > 5) OR (pagg_tab2.y < 20))
+         ->  Append
+               ->  Seq Scan on pagg_tab1_p1 pagg_tab1_1
+                     Filter: (x < 20)
+               ->  Seq Scan on pagg_tab1_p2 pagg_tab1_2
+                     Filter: (x < 20)
+         ->  Hash
                ->  Append
-                     ->  Seq Scan on pagg_tab1_p1 pagg_tab1_1
-                           Filter: (x < 20)
-                     ->  Seq Scan on pagg_tab1_p2 pagg_tab1_2
-                           Filter: (x < 20)
-               ->  Hash
-                     ->  Append
-                           ->  Seq Scan on pagg_tab2_p2 pagg_tab2_1
-                                 Filter: (y > 10)
-                           ->  Seq Scan on pagg_tab2_p3 pagg_tab2_2
-                                 Filter: (y > 10)
-(18 rows)
+                     ->  Seq Scan on pagg_tab2_p2 pagg_tab2_1
+                           Filter: (y > 10)
+                     ->  Seq Scan on pagg_tab2_p3 pagg_tab2_2
+                           Filter: (y > 10)
+(16 rows)
 
 SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a LEFT JOIN (SELECT * FROM pagg_tab2 WHERE y > 10) b ON a.x = b.y WHERE a.x > 5 or b.y < 20  GROUP BY a.x, b.y ORDER BY 1, 2;
  x  | y  | count 
@@ -801,27 +932,25 @@ SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a LEFT JOI
 -- But right now we are unable to do partitionwise join in this case.
 EXPLAIN (COSTS OFF)
 SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a FULL JOIN (SELECT * FROM pagg_tab2 WHERE y > 10) b ON a.x = b.y WHERE a.x > 5 or b.y < 20  GROUP BY a.x, b.y ORDER BY 1, 2;
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Sort
-   Sort Key: pagg_tab1.x, pagg_tab2.y
-   ->  HashAggregate
-         Group Key: pagg_tab1.x, pagg_tab2.y
-         ->  Hash Full Join
-               Hash Cond: (pagg_tab1.x = pagg_tab2.y)
-               Filter: ((pagg_tab1.x > 5) OR (pagg_tab2.y < 20))
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ IndexAggregate
+   Group Key: pagg_tab1.x, pagg_tab2.y
+   ->  Hash Full Join
+         Hash Cond: (pagg_tab1.x = pagg_tab2.y)
+         Filter: ((pagg_tab1.x > 5) OR (pagg_tab2.y < 20))
+         ->  Append
+               ->  Seq Scan on pagg_tab1_p1 pagg_tab1_1
+                     Filter: (x < 20)
+               ->  Seq Scan on pagg_tab1_p2 pagg_tab1_2
+                     Filter: (x < 20)
+         ->  Hash
                ->  Append
-                     ->  Seq Scan on pagg_tab1_p1 pagg_tab1_1
-                           Filter: (x < 20)
-                     ->  Seq Scan on pagg_tab1_p2 pagg_tab1_2
-                           Filter: (x < 20)
-               ->  Hash
-                     ->  Append
-                           ->  Seq Scan on pagg_tab2_p2 pagg_tab2_1
-                                 Filter: (y > 10)
-                           ->  Seq Scan on pagg_tab2_p3 pagg_tab2_2
-                                 Filter: (y > 10)
-(18 rows)
+                     ->  Seq Scan on pagg_tab2_p2 pagg_tab2_1
+                           Filter: (y > 10)
+                     ->  Seq Scan on pagg_tab2_p3 pagg_tab2_2
+                           Filter: (y > 10)
+(16 rows)
 
 SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a FULL JOIN (SELECT * FROM pagg_tab2 WHERE y > 10) b ON a.x = b.y WHERE a.x > 5 or b.y < 20 GROUP BY a.x, b.y ORDER BY 1, 2;
  x  | y  | count 
@@ -839,16 +968,14 @@ SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a FULL JOI
 -- Empty join relation because of empty outer side, no partitionwise agg plan
 EXPLAIN (COSTS OFF)
 SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x, a.y ORDER BY 1, 2;
-                  QUERY PLAN                  
-----------------------------------------------
- GroupAggregate
+               QUERY PLAN               
+----------------------------------------
+ IndexAggregate
    Group Key: pagg_tab1.y
-   ->  Sort
-         Sort Key: pagg_tab1.y
-         ->  Result
-               Replaces: Join on b, pagg_tab1
-               One-Time Filter: false
-(7 rows)
+   ->  Result
+         Replaces: Join on b, pagg_tab1
+         One-Time Filter: false
+(5 rows)
 
 SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x, a.y ORDER BY 1, 2;
  x | y | count 
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 17d27ef3d46..9292d94baab 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -647,52 +647,39 @@ EXPLAIN (COSTS OFF)
 SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
   WHERE a BETWEEN 490 AND 510
   GROUP BY 1, 2 ORDER BY 1, 2;
-                                                   QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
- Group
+                                             QUERY PLAN                                              
+-----------------------------------------------------------------------------------------------------
+ IndexAggregate
    Group Key: (COALESCE(prt1.a, p2.a)), (COALESCE(prt1.b, p2.b))
-   ->  Merge Append
-         Sort Key: (COALESCE(prt1.a, p2.a)), (COALESCE(prt1.b, p2.b))
-         ->  Group
-               Group Key: (COALESCE(prt1.a, p2.a)), (COALESCE(prt1.b, p2.b))
+   ->  Append
+         ->  Merge Full Join
+               Merge Cond: ((prt1_1.a = p2_1.a) AND (prt1_1.b = p2_1.b))
+               Filter: ((COALESCE(prt1_1.a, p2_1.a) >= 490) AND (COALESCE(prt1_1.a, p2_1.a) <= 510))
                ->  Sort
-                     Sort Key: (COALESCE(prt1.a, p2.a)), (COALESCE(prt1.b, p2.b))
-                     ->  Merge Full Join
-                           Merge Cond: ((prt1.a = p2.a) AND (prt1.b = p2.b))
-                           Filter: ((COALESCE(prt1.a, p2.a) >= 490) AND (COALESCE(prt1.a, p2.a) <= 510))
-                           ->  Sort
-                                 Sort Key: prt1.a, prt1.b
-                                 ->  Seq Scan on prt1_p1 prt1
-                           ->  Sort
-                                 Sort Key: p2.a, p2.b
-                                 ->  Seq Scan on prt2_p1 p2
-         ->  Group
-               Group Key: (COALESCE(prt1_1.a, p2_1.a)), (COALESCE(prt1_1.b, p2_1.b))
+                     Sort Key: prt1_1.a, prt1_1.b
+                     ->  Seq Scan on prt1_p1 prt1_1
                ->  Sort
-                     Sort Key: (COALESCE(prt1_1.a, p2_1.a)), (COALESCE(prt1_1.b, p2_1.b))
-                     ->  Merge Full Join
-                           Merge Cond: ((prt1_1.a = p2_1.a) AND (prt1_1.b = p2_1.b))
-                           Filter: ((COALESCE(prt1_1.a, p2_1.a) >= 490) AND (COALESCE(prt1_1.a, p2_1.a) <= 510))
-                           ->  Sort
-                                 Sort Key: prt1_1.a, prt1_1.b
-                                 ->  Seq Scan on prt1_p2 prt1_1
-                           ->  Sort
-                                 Sort Key: p2_1.a, p2_1.b
-                                 ->  Seq Scan on prt2_p2 p2_1
-         ->  Group
-               Group Key: (COALESCE(prt1_2.a, p2_2.a)), (COALESCE(prt1_2.b, p2_2.b))
+                     Sort Key: p2_1.a, p2_1.b
+                     ->  Seq Scan on prt2_p1 p2_1
+         ->  Merge Full Join
+               Merge Cond: ((prt1_2.a = p2_2.a) AND (prt1_2.b = p2_2.b))
+               Filter: ((COALESCE(prt1_2.a, p2_2.a) >= 490) AND (COALESCE(prt1_2.a, p2_2.a) <= 510))
                ->  Sort
-                     Sort Key: (COALESCE(prt1_2.a, p2_2.a)), (COALESCE(prt1_2.b, p2_2.b))
-                     ->  Merge Full Join
-                           Merge Cond: ((prt1_2.a = p2_2.a) AND (prt1_2.b = p2_2.b))
-                           Filter: ((COALESCE(prt1_2.a, p2_2.a) >= 490) AND (COALESCE(prt1_2.a, p2_2.a) <= 510))
-                           ->  Sort
-                                 Sort Key: prt1_2.a, prt1_2.b
-                                 ->  Seq Scan on prt1_p3 prt1_2
-                           ->  Sort
-                                 Sort Key: p2_2.a, p2_2.b
-                                 ->  Seq Scan on prt2_p3 p2_2
-(43 rows)
+                     Sort Key: prt1_2.a, prt1_2.b
+                     ->  Seq Scan on prt1_p2 prt1_2
+               ->  Sort
+                     Sort Key: p2_2.a, p2_2.b
+                     ->  Seq Scan on prt2_p2 p2_2
+         ->  Merge Full Join
+               Merge Cond: ((prt1_3.a = p2_3.a) AND (prt1_3.b = p2_3.b))
+               Filter: ((COALESCE(prt1_3.a, p2_3.a) >= 490) AND (COALESCE(prt1_3.a, p2_3.a) <= 510))
+               ->  Sort
+                     Sort Key: prt1_3.a, prt1_3.b
+                     ->  Seq Scan on prt1_p3 prt1_3
+               ->  Sort
+                     Sort Key: p2_3.a, p2_3.b
+                     ->  Seq Scan on prt2_p3 p2_3
+(30 rows)
 
 SELECT a, b FROM prt1 FULL JOIN prt2 p2(b,a,c) USING(a,b)
   WHERE a BETWEEN 490 AND 510
@@ -1555,41 +1542,39 @@ ANALYZE plt1_e;
 -- test partition matching with N-way join
 EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM plt1 t1, plt2 t2, plt1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-                                   QUERY PLAN                                   
---------------------------------------------------------------------------------
- GroupAggregate
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
+ IndexAggregate
    Group Key: t1.c, t3.c
-   ->  Sort
-         Sort Key: t1.c, t3.c
-         ->  Append
+   ->  Append
+         ->  Hash Join
+               Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
                ->  Hash Join
-                     Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
-                     ->  Hash Join
-                           Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
-                           ->  Seq Scan on plt1_p1 t1_1
-                           ->  Hash
-                                 ->  Seq Scan on plt2_p1 t2_1
+                     Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
+                     ->  Seq Scan on plt1_p1 t1_1
                      ->  Hash
-                           ->  Seq Scan on plt1_e_p1 t3_1
+                           ->  Seq Scan on plt2_p1 t2_1
+               ->  Hash
+                     ->  Seq Scan on plt1_e_p1 t3_1
+         ->  Hash Join
+               Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
                ->  Hash Join
-                     Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
-                     ->  Hash Join
-                           Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
-                           ->  Seq Scan on plt1_p2 t1_2
-                           ->  Hash
-                                 ->  Seq Scan on plt2_p2 t2_2
+                     Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
+                     ->  Seq Scan on plt1_p2 t1_2
                      ->  Hash
-                           ->  Seq Scan on plt1_e_p2 t3_2
+                           ->  Seq Scan on plt2_p2 t2_2
+               ->  Hash
+                     ->  Seq Scan on plt1_e_p2 t3_2
+         ->  Hash Join
+               Hash Cond: (t1_3.c = ltrim(t3_3.c, 'A'::text))
                ->  Hash Join
-                     Hash Cond: (t1_3.c = ltrim(t3_3.c, 'A'::text))
-                     ->  Hash Join
-                           Hash Cond: ((t1_3.b = t2_3.b) AND (t1_3.c = t2_3.c))
-                           ->  Seq Scan on plt1_p3 t1_3
-                           ->  Hash
-                                 ->  Seq Scan on plt2_p3 t2_3
+                     Hash Cond: ((t1_3.b = t2_3.b) AND (t1_3.c = t2_3.c))
+                     ->  Seq Scan on plt1_p3 t1_3
                      ->  Hash
-                           ->  Seq Scan on plt1_e_p3 t3_3
-(32 rows)
+                           ->  Seq Scan on plt2_p3 t2_3
+               ->  Hash
+                     ->  Seq Scan on plt1_e_p3 t3_3
+(30 rows)
 
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM plt1 t1, plt2 t2, plt1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
          avg          |         avg          |          avg          |  c   |  c   |   c   
@@ -1703,41 +1688,39 @@ ANALYZE pht1_e;
 -- test partition matching with N-way join
 EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-                                   QUERY PLAN                                   
---------------------------------------------------------------------------------
- GroupAggregate
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
+ IndexAggregate
    Group Key: t1.c, t3.c
-   ->  Sort
-         Sort Key: t1.c, t3.c
-         ->  Append
+   ->  Append
+         ->  Hash Join
+               Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
                ->  Hash Join
-                     Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
-                     ->  Hash Join
-                           Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
-                           ->  Seq Scan on pht1_p1 t1_1
-                           ->  Hash
-                                 ->  Seq Scan on pht2_p1 t2_1
+                     Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
+                     ->  Seq Scan on pht1_p1 t1_1
                      ->  Hash
-                           ->  Seq Scan on pht1_e_p1 t3_1
+                           ->  Seq Scan on pht2_p1 t2_1
+               ->  Hash
+                     ->  Seq Scan on pht1_e_p1 t3_1
+         ->  Hash Join
+               Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
                ->  Hash Join
-                     Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
-                     ->  Hash Join
-                           Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
-                           ->  Seq Scan on pht1_p2 t1_2
-                           ->  Hash
-                                 ->  Seq Scan on pht2_p2 t2_2
+                     Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
+                     ->  Seq Scan on pht1_p2 t1_2
                      ->  Hash
-                           ->  Seq Scan on pht1_e_p2 t3_2
+                           ->  Seq Scan on pht2_p2 t2_2
+               ->  Hash
+                     ->  Seq Scan on pht1_e_p2 t3_2
+         ->  Hash Join
+               Hash Cond: (t1_3.c = ltrim(t3_3.c, 'A'::text))
                ->  Hash Join
-                     Hash Cond: (t1_3.c = ltrim(t3_3.c, 'A'::text))
-                     ->  Hash Join
-                           Hash Cond: ((t1_3.b = t2_3.b) AND (t1_3.c = t2_3.c))
-                           ->  Seq Scan on pht1_p3 t1_3
-                           ->  Hash
-                                 ->  Seq Scan on pht2_p3 t2_3
+                     Hash Cond: ((t1_3.b = t2_3.b) AND (t1_3.c = t2_3.c))
+                     ->  Seq Scan on pht1_p3 t1_3
                      ->  Hash
-                           ->  Seq Scan on pht1_e_p3 t3_3
-(32 rows)
+                           ->  Seq Scan on pht2_p3 t2_3
+               ->  Hash
+                     ->  Seq Scan on pht1_e_p3 t3_3
+(30 rows)
 
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
          avg          |         avg          |         avg          |  c   |  c   |   c   
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 933921d1860..ea40dfa9a30 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -706,18 +706,14 @@ alter table tenk2 reset (parallel_workers);
 set enable_hashagg = false;
 explain (costs off)
    select count(*) from tenk1 group by twenty;
-                     QUERY PLAN                     
-----------------------------------------------------
- Finalize GroupAggregate
+               QUERY PLAN               
+----------------------------------------
+ IndexAggregate
    Group Key: twenty
-   ->  Gather Merge
+   ->  Gather
          Workers Planned: 4
-         ->  Partial GroupAggregate
-               Group Key: twenty
-               ->  Sort
-                     Sort Key: twenty
-                     ->  Parallel Seq Scan on tenk1
-(9 rows)
+         ->  Parallel Seq Scan on tenk1
+(5 rows)
 
 select count(*) from tenk1 group by twenty;
  count 
@@ -772,19 +768,15 @@ drop function sp_simple_func(integer);
 -- test handling of SRFs in targetlist (bug in 10.0)
 explain (costs off)
    select count(*), generate_series(1,2) from tenk1 group by twenty;
-                        QUERY PLAN                        
-----------------------------------------------------------
+                  QUERY PLAN                  
+----------------------------------------------
  ProjectSet
-   ->  Finalize GroupAggregate
+   ->  IndexAggregate
          Group Key: twenty
-         ->  Gather Merge
+         ->  Gather
                Workers Planned: 4
-               ->  Partial GroupAggregate
-                     Group Key: twenty
-                     ->  Sort
-                           Sort Key: twenty
-                           ->  Parallel Seq Scan on tenk1
-(10 rows)
+               ->  Parallel Seq Scan on tenk1
+(6 rows)
 
 select count(*), generate_series(1,2) from tenk1 group by twenty;
  count | generate_series 
@@ -833,6 +825,7 @@ select count(*), generate_series(1,2) from tenk1 group by twenty;
 
 -- test gather merge with parallel leader participation disabled
 set parallel_leader_participation = off;
+set enable_indexagg = off;
 explain (costs off)
    select count(*) from tenk1 group by twenty;
                      QUERY PLAN                     
@@ -876,6 +869,7 @@ select count(*) from tenk1 group by twenty;
 reset parallel_leader_participation;
 --test rescan behavior of gather merge
 set enable_material = false;
+set enable_indexagg = false;
 explain (costs off)
 select * from
   (select string4, count(unique2)
@@ -917,6 +911,7 @@ select * from
 (12 rows)
 
 reset enable_material;
+reset enable_indexagg;
 reset enable_hashagg;
 -- check parallelized int8 aggregate (bug #14897)
 explain (costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 0411db832f1..d32bec316d3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -157,6 +157,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashagg                 | on
  enable_hashjoin                | on
  enable_incremental_sort        | on
+ enable_indexagg                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
  enable_material                | on
@@ -173,7 +174,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(25 rows)
+(26 rows)
 
 -- There are always wait event descriptions for various types.  InjectionPoint
 -- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 850f5a5787f..f72eb367112 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1392,6 +1392,7 @@ CREATE INDEX btg_x_y_idx ON btg(x, y);
 ANALYZE btg;
 
 SET enable_hashagg = off;
+SET enable_indexagg = off;
 SET enable_seqscan = off;
 
 -- Utilize the ordering of index scan to avoid a Sort operation
@@ -1623,12 +1624,100 @@ select v||'a', case v||'a' when 'aa' then 1 else 0 end, count(*)
 select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
   from unnest(array['a','b']) u(v)
  group by v||'a' order by 1;
+ 
+--
+-- Index Aggregation tests
+--
+
+set enable_hashagg = false;
+set enable_sort = false;
+set enable_indexagg = true;
+set enable_indexscan = false;
+
+-- require ordered output
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+-- multiple grouping columns
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+
+-- mixing columns between group by and order by
+begin;
+
+create temp table tmp(x int, y int);
+insert into tmp values (1, 8), (2, 7), (3, 6), (4, 5);
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+
+--
+-- Index Aggregation Spill tests
+--
+
+set enable_indexagg = true;
+set enable_sort=false;
+set enable_hashagg = false;
+set work_mem='64kB';
+
+select unique1, count(*), sum(twothousand) from tenk1
+group by unique1
+having sum(fivethous) > 4975
+order by sum(twothousand);
+
+set work_mem to default;
+set enable_sort to default;
+set enable_hashagg to default;
+set enable_indexagg to default;
 
 --
 -- Hash Aggregation Spill tests
 --
 
 set enable_sort=false;
+set enable_indexagg = false;
 set work_mem='64kB';
 
 select unique1, count(*), sum(twothousand) from tenk1
@@ -1657,6 +1746,7 @@ analyze agg_data_20k;
 -- Produce results with sorting.
 
 set enable_hashagg = false;
+set enable_indexagg = false;
 
 set jit_above_cost = 0;
 
@@ -1728,23 +1818,68 @@ select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
 set enable_sort = true;
 set work_mem to default;
 
+-- Produce results with index aggregation
+
+set enable_sort = false;
+set enable_hashagg = false;
+set enable_indexagg = true;
+
+set jit_above_cost = 0;
+
+explain (costs off)
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+
+create table agg_index_1 as
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+
+create table agg_index_2 as
+select * from
+  (values (100), (300), (500)) as r(a),
+  lateral (
+    select (g/2)::numeric as c1,
+           array_agg(g::numeric) as c2,
+	   count(*) as c3
+    from agg_data_2k
+    where g < r.a
+    group by g/2) as s;
+
+set jit_above_cost to default;
+
+create table agg_index_3 as
+select (g/2)::numeric as c1, sum(7::int4) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+
+create table agg_index_4 as
+select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+
 -- Compare group aggregation results to hash aggregation results
 
 (select * from agg_hash_1 except select * from agg_group_1)
   union all
-(select * from agg_group_1 except select * from agg_hash_1);
+(select * from agg_group_1 except select * from agg_hash_1)
+  union all
+(select * from agg_index_1 except select * from agg_group_1);
 
 (select * from agg_hash_2 except select * from agg_group_2)
   union all
-(select * from agg_group_2 except select * from agg_hash_2);
+(select * from agg_group_2 except select * from agg_hash_2)
+  union all
+(select * from agg_index_2 except select * from agg_group_2);
 
 (select * from agg_hash_3 except select * from agg_group_3)
   union all
-(select * from agg_group_3 except select * from agg_hash_3);
+(select * from agg_group_3 except select * from agg_hash_3)
+  union all
+(select * from agg_index_3 except select * from agg_group_3);
 
 (select * from agg_hash_4 except select * from agg_group_4)
   union all
-(select * from agg_group_4 except select * from agg_hash_4);
+(select * from agg_group_4 except select * from agg_hash_4)
+  union all
+(select * from agg_index_4 except select * from agg_group_4);
 
 drop table agg_group_1;
 drop table agg_group_2;
@@ -1754,3 +1889,7 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_index_1;
+drop table agg_index_2;
+drop table agg_index_3;
+drop table agg_index_4;
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
index abe6d6ae09f..f9f4b5dcebd 100644
--- a/src/test/regress/sql/eager_aggregate.sql
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -35,6 +35,7 @@ GROUP BY t1.a ORDER BY t1.a;
 
 -- Produce results with sorting aggregation
 SET enable_hashagg TO off;
+SET enable_indexagg TO off;
 
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a, avg(t2.c)
@@ -48,6 +49,25 @@ SELECT t1.a, avg(t2.c)
 GROUP BY t1.a ORDER BY t1.a;
 
 RESET enable_hashagg;
+RESET enable_indexagg;
+
+-- Produce results with index aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+RESET enable_sort;
 
 
 --
@@ -71,6 +91,7 @@ GROUP BY t1.a ORDER BY t1.a;
 
 -- Produce results with sorting aggregation
 SET enable_hashagg TO off;
+SET enable_indexagg TO off;
 
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a, avg(t2.c + t3.c)
@@ -86,7 +107,27 @@ SELECT t1.a, avg(t2.c + t3.c)
 GROUP BY t1.a ORDER BY t1.a;
 
 RESET enable_hashagg;
+RESET enable_indexagg;
 
+-- Produce results with index aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+  JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c + t3.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+  JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+RESET enable_sort;
 
 --
 -- Test that eager aggregation works for outer join
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index 7c725e2663a..75ee7b30fc1 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -55,8 +55,9 @@ EXPLAIN (COSTS OFF)
 SELECT c, sum(a) FROM pagg_tab WHERE c = 'x' GROUP BY c;
 SELECT c, sum(a) FROM pagg_tab WHERE c = 'x' GROUP BY c;
 
--- Test GroupAggregate paths by disabling hash aggregates.
+-- Test GroupAggregate paths by disabling hash and index aggregates.
 SET enable_hashagg TO false;
+SET enable_indexagg TO false;
 
 -- When GROUP BY clause matches full aggregation is performed for each partition.
 EXPLAIN (COSTS OFF)
@@ -81,6 +82,33 @@ EXPLAIN (COSTS OFF)
 SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;
 SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;
 
+RESET enable_hashagg;
+RESET enable_indexagg;
+
+-- Test IndexAggregate paths by disabling hash and group aggregates.
+SET enable_sort TO false;
+SET enable_hashagg TO false;
+
+-- When GROUP BY clause matches full aggregation is performed for each partition.
+EXPLAIN (COSTS OFF)
+SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+
+-- When GROUP BY clause does not match; full aggregation by top node.
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+
+-- Test partitionwise grouping without any aggregates
+EXPLAIN (COSTS OFF)
+SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
+SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
+SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
+
+
+RESET enable_sort;
 RESET enable_hashagg;
 
 -- ROLLUP, partitionwise aggregation does not apply
@@ -135,10 +163,12 @@ SELECT t2.y, sum(t1.y), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2
 -- When GROUP BY clause does not match; partial aggregation is performed for each partition.
 -- Also test GroupAggregate paths by disabling hash aggregates.
 SET enable_hashagg TO false;
+SET enable_indexagg TO false;
 EXPLAIN (COSTS OFF)
 SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y HAVING avg(t1.x) > 10 ORDER BY 1, 2, 3;
 SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y HAVING avg(t1.x) > 10 ORDER BY 1, 2, 3;
 RESET enable_hashagg;
+RESET enable_indexagg;
 
 -- Check with LEFT/RIGHT/FULL OUTER JOINs which produces NULL values for
 -- aggregation
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 71a75bc86ea..5f398219166 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -318,6 +318,7 @@ select count(*), generate_series(1,2) from tenk1 group by twenty;
 
 -- test gather merge with parallel leader participation disabled
 set parallel_leader_participation = off;
+set enable_indexagg = off;
 
 explain (costs off)
    select count(*) from tenk1 group by twenty;
@@ -328,6 +329,7 @@ reset parallel_leader_participation;
 
 --test rescan behavior of gather merge
 set enable_material = false;
+set enable_indexagg = false;
 
 explain (costs off)
 select * from
@@ -341,6 +343,7 @@ select * from
   right join (values (1),(2),(3)) v(x) on true;
 
 reset enable_material;
+reset enable_indexagg;
 
 reset enable_hashagg;
 
-- 
2.43.0

Sergey Soloviev

sergey.soloviev@tantorlabs.ru

about 1 month ago

In reply to: Sergey Soloviev (#6)

5 attachment(s)

Re: Introduce Index Aggregate - new GROUP BY strategy

Hi!

I have finally added support for Partial IndexAggregate. There was a problem with
sortgroupref and target list entries mismatch due to partial aggregates in it.
To solve this I had to add new argument to 'create_agg_path' - 'pathkeys' which is
a List of PathKey.

Previously this information was calculated in the function just like AGG_SORTED
do this. But when we calculating pathkeys we must consider whether it is a child
rel to properly build pathkeys and if so use it's parent. The latter information is
not known inside 'create_agg_path', thus instead of passing 'parent' we explicitly
pass already built 'pathkeys'. I did not change AGG_SORTED logic, so this is used
only by AGG_INDEX.

This logic is placed in another patch file just to make review of this change easier.

Also, cost calculation logic is adjusted a bit - it takes into account top-down index
traversal and final external merge cost is added only if spill expected.

---
Sergey Soloviev
TantorLabs: https://tantorlabs.com

Attachments:

v3-0001-add-in-memory-btree-tuple-index.patchtext/x-patch; charset=UTF-8; name=v3-0001-add-in-memory-btree-tuple-index.patchDownload

From 3d3468260f40640fc9b5170e5197c409ddb4eed4 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 15:25:41 +0300
Subject: [PATCH v3 1/5] add in-memory btree tuple index

This patch implements in-memory B+tree structure. It will be used as
index for special type of grouping using index.

Size of each node is set using macro. For convenience equals 2^n - 1, so
for internal nodes we effectively calculate size of each page and find
split node (exactly in the middle), and for leaf nodes we can distribute
tuples for each node uniformely (according to the newly inserted tuple).

It supports different memory contexts for tracking memory allocations.
And just like in TupleHashTable during Lookup it uses 'isnew' pointer to
prevent new tuple creation (i.e. when memory limit is reached).

Also it has key abbreviation optimization support like in tuplesort. But
some code was copied and looks exactly the same way, so it is worth
separating such logic into a separate function.
---
 src/backend/executor/execGrouping.c | 643 ++++++++++++++++++++++++++++
 src/include/executor/executor.h     |  65 +++
 src/include/nodes/execnodes.h       |  84 ++++
 3 files changed, 792 insertions(+)

diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
index 8eb4c25e1cb..c83a3f2223d 100644
--- a/src/backend/executor/execGrouping.c
+++ b/src/backend/executor/execGrouping.c
@@ -622,3 +622,646 @@ TupleHashTableMatch(struct tuplehash_hash *tb, MinimalTuple tuple1, MinimalTuple
 	econtext->ecxt_outertuple = slot1;
 	return !ExecQualAndReset(hashtable->cur_eq_func, econtext);
 }
+
+/*****************************************************************************
+ * 		Utility routines for all-in-memory btree index
+ * 
+ * These routines build btree index for grouping tuples together (eg, for
+ * index aggregation).  There is one entry for each not-distinct set of tuples
+ * presented.
+ *****************************************************************************/
+
+/* 
+ * Representation of searched entry in tuple index. This have
+ * separate representation to avoid necessary memory allocations
+ * to create MinimalTuple for TupleIndexEntry.
+ */
+typedef struct TupleIndexSearchEntryData
+{
+	TupleTableSlot *slot;		/* search TupleTableSlot */
+	Datum	key1;				/* first searched key data */
+	bool	isnull1;			/* first searched key is null */
+} TupleIndexSearchEntryData;
+
+typedef TupleIndexSearchEntryData *TupleIndexSearchEntry;
+
+/* 
+ * compare_index_tuple_tiebreak
+ * 		Perform full comparison of tuples without key abbreviation.
+ * 
+ * Invoked if first key (possibly abbreviated) can not decide comparison, so
+ * we have to compare all keys.
+ */
+static inline int
+compare_index_tuple_tiebreak(TupleIndex index, TupleIndexEntry left,
+							 TupleIndexSearchEntry right)
+{
+	HeapTupleData ltup;
+	SortSupport sortKey = index->sortKeys;
+	TupleDesc tupDesc = index->tupDesc;
+	AttrNumber	attno;
+	Datum		datum1,
+				datum2;
+	bool		isnull1,
+				isnull2;
+	int			cmp;
+
+	ltup.t_len = left->tuple->t_len + MINIMAL_TUPLE_OFFSET;
+	ltup.t_data = (HeapTupleHeader) ((char *) left->tuple - MINIMAL_TUPLE_OFFSET);
+	tupDesc = index->tupDesc;
+
+	if (sortKey->abbrev_converter)
+	{
+		attno = sortKey->ssup_attno;
+
+		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);
+		datum2 = slot_getattr(right->slot, attno, &isnull2);
+
+		cmp = ApplySortAbbrevFullComparator(datum1, isnull1,
+											datum2, isnull2,
+											sortKey);
+		if (cmp != 0)
+			return cmp;
+	}
+
+	sortKey++;
+	for (int nkey = 1; nkey < index->nkeys; nkey++, sortKey++)
+	{
+		attno = sortKey->ssup_attno;
+
+		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);
+		datum2 = slot_getattr(right->slot, attno, &isnull2);
+
+		cmp = ApplySortComparator(datum1, isnull1,
+								  datum2, isnull2,
+								  sortKey);
+		if (cmp != 0)
+			return cmp;
+	}
+	
+	return 0;
+}
+
+/* 
+ * compare_index_tuple
+ * 		Compare pair of tuples during index lookup
+ * 
+ * The comparison honors key abbreviation.
+ */
+static int
+compare_index_tuple(TupleIndex index,
+					TupleIndexEntry left,
+					TupleIndexSearchEntry right)
+{
+	SortSupport sortKey = &index->sortKeys[0];
+	int	cmp = 0;
+	
+	cmp = ApplySortComparator(left->key1, left->isnull1,
+							  right->key1, right->isnull1,
+							  sortKey);
+	if (cmp != 0)
+		return cmp;
+
+	return compare_index_tuple_tiebreak(index, left, right);
+}
+
+/* 
+ * tuple_index_node_bsearch
+ * 		Perform binary search in the index node.
+ * 
+ * On return, if 'found' is set to 'true', then exact match found and returned
+ * index is an index in tuples array.  Otherwise the value handled differently:
+ * - for internal nodes this is an index in 'pointers' array which to follow
+ * - for leaf nodes this is an index to which new entry must be inserted.
+ */
+static int
+tuple_index_node_bsearch(TupleIndex index, TupleIndexNode node,
+						 TupleIndexSearchEntry search, bool *found)
+{
+	int low;
+	int high;
+	
+	low = 0;
+	high = node->ntuples;
+	*found = false;
+
+	while (low < high)
+	{
+		OffsetNumber mid = (low + high) / 2;
+		TupleIndexEntry mid_entry = node->tuples[mid];
+		int cmp;
+
+		cmp = compare_index_tuple(index, mid_entry, search);
+		if (cmp == 0)
+		{
+			*found = true;
+			return mid;
+		}
+
+		if (cmp < 0)
+			low = mid + 1;
+		else
+			high = mid;
+	}
+
+	return low;
+}
+
+static inline TupleIndexNode
+IndexLeafNodeGetNext(TupleIndexNode node)
+{
+	return node->pointers[0];
+}
+
+static inline void
+IndexLeafNodeSetNext(TupleIndexNode node, TupleIndexNode next)
+{
+	node->pointers[0] = next;
+}
+
+#define SizeofTupleIndexInternalNode \
+	  (offsetof(TupleIndexNodeData, pointers) \
+	+ (TUPLE_INDEX_NODE_MAX_ENTRIES + 1) * sizeof(TupleIndexNode))
+
+#define SizeofTupleIndexLeafNode \
+	offsetof(TupleIndexNodeData, pointers) + sizeof(TupleIndexNode)
+
+static inline TupleIndexNode
+AllocLeafIndexNode(TupleIndex index, TupleIndexNode next)
+{
+	TupleIndexNode leaf;
+	leaf = MemoryContextAllocZero(index->nodecxt, SizeofTupleIndexLeafNode);
+	IndexLeafNodeSetNext(leaf, next);
+	return leaf;
+}
+
+static inline TupleIndexNode
+AllocInternalIndexNode(TupleIndex index)
+{
+	return MemoryContextAllocZero(index->nodecxt, SizeofTupleIndexInternalNode);
+}
+
+/* 
+ * tuple_index_node_insert_at
+ * 		Insert new tuple in the node at specified index
+ * 
+ * This function is inserted when new tuple must be inserted in the node (both
+ * leaf and internal). For internal nodes 'pointer' must be also specified.
+ *
+ * Node must have free space available. It's up to caller to check if node
+ * is full and needs splitting. For split use 'tuple_index_perform_insert_split'.
+ */
+static inline void
+tuple_index_node_insert_at(TupleIndexNode node, bool is_leaf, int idx,
+						   TupleIndexEntry entry, TupleIndexNode pointer)
+{
+	int move_count;
+
+	Assert(node->ntuples < TUPLE_INDEX_NODE_MAX_ENTRIES);
+	Assert(0 <= idx && idx <= node->ntuples);
+	move_count = node->ntuples - idx;
+
+	if (move_count > 0)
+		memmove(&node->tuples[idx + 1], &node->tuples[idx],
+			move_count * sizeof(TupleIndexEntry));
+
+	node->tuples[idx] = entry;
+
+	if (!is_leaf)
+	{
+		Assert(pointer != NULL);
+
+		if (move_count > 0)
+			memmove(&node->pointers[idx + 2], &node->pointers[idx + 1],
+					move_count * sizeof(TupleIndexNode));
+		node->pointers[idx + 1] = pointer;
+	}
+
+	node->ntuples++;
+}
+
+/* 
+ * Insert tuple to full node with page split.
+ * 
+ * 'split_node_out' - new page containing nodes on right side
+ * 'split_tuple_out' - tuple, which sent to the parent node as new separator key
+ */
+static void
+tuple_index_insert_split(TupleIndex index, TupleIndexNode node, bool is_leaf,
+						 int insert_pos, TupleIndexNode *split_node_out,
+						 TupleIndexEntry *split_entry_out)
+{
+	TupleIndexNode split;
+	int split_tuple_idx;
+
+	Assert(node->ntuples == TUPLE_INDEX_NODE_MAX_ENTRIES);
+
+	if (is_leaf)
+	{
+		/* 
+		 * Max amount of tuples is kept odd, so we need to decide at
+		 * which index to perform page split. We know that split occurred
+		 * during insert, so left less entries to the page at which
+		 * insertion must occur.
+		 */
+		if (TUPLE_INDEX_NODE_MAX_ENTRIES / 2 < insert_pos)
+			split_tuple_idx = TUPLE_INDEX_NODE_MAX_ENTRIES / 2 + 1;
+		else
+			split_tuple_idx = TUPLE_INDEX_NODE_MAX_ENTRIES / 2;
+
+		split = AllocLeafIndexNode(index, IndexLeafNodeGetNext(node));
+		split->ntuples = node->ntuples - split_tuple_idx;
+		node->ntuples = split_tuple_idx;
+		memcpy(&split->tuples[0], &node->tuples[node->ntuples], 
+			   sizeof(TupleIndexEntry) * split->ntuples);
+		IndexLeafNodeSetNext(node, split);
+	}
+	else
+	{
+		/* 
+		 * After split on internal node split tuple will be removed.
+		 * Max amount of tuples is odd, so division by 2 will handle it.
+		 */
+		split_tuple_idx = TUPLE_INDEX_NODE_MAX_ENTRIES / 2;
+		split = AllocInternalIndexNode(index);
+		split->ntuples = split_tuple_idx;
+		node->ntuples = split_tuple_idx;
+		memcpy(&split->tuples[0], &node->tuples[split_tuple_idx + 1],
+				sizeof(TupleIndexEntry) * split->ntuples);
+		memcpy(&split->pointers[0], &node->pointers[split_tuple_idx + 1],
+				sizeof(TupleIndexNode) * (split->ntuples + 1));
+	}
+
+	*split_node_out = split;
+	*split_entry_out = node->tuples[split_tuple_idx];
+}
+
+static inline Datum
+mintup_getattr(MinimalTuple tup, TupleDesc tupdesc, AttrNumber attnum, bool *isnull)
+{
+	HeapTupleData htup;
+
+	htup.t_len = tup->t_len + MINIMAL_TUPLE_OFFSET;
+	htup.t_data = (HeapTupleHeader) ((char *) tup - MINIMAL_TUPLE_OFFSET);
+
+	return heap_getattr(&htup, attnum, tupdesc, isnull);
+}
+
+static TupleIndexEntry
+tuple_index_node_lookup(TupleIndex index,
+						TupleIndexNode node, int level,
+						TupleIndexSearchEntry search, bool *is_new,
+						TupleIndexNode *split_node_out,
+						TupleIndexEntry *split_entry_out)
+{
+	TupleIndexEntry entry;
+	int idx;
+	bool found;
+	bool is_leaf;
+
+	TupleIndexNode insert_pointer;
+	TupleIndexEntry insert_entry;
+	bool need_insert;
+
+	Assert(level >= 0);
+
+	idx = tuple_index_node_bsearch(index, node, search, &found);
+	if (found)
+	{
+		/* 
+		 * Both internal and leaf nodes store pointers to elements, so we can
+		 * safely return exact match found at each level.
+		 */
+		if (is_new)
+			*is_new = false;
+		return node->tuples[idx];
+	}
+
+	is_leaf = level == 0;
+	if (is_leaf)
+	{
+		MemoryContext oldcxt;
+
+		if (is_new == NULL)
+			return NULL;
+
+		oldcxt = MemoryContextSwitchTo(index->tuplecxt);
+
+		entry = palloc(sizeof(TupleIndexEntryData));
+		entry->tuple = ExecCopySlotMinimalTupleExtra(search->slot, index->additionalsize);
+
+		MemoryContextSwitchTo(oldcxt);
+
+		/* 
+		 * key1 in search tuple stored in TableTupleSlot which have it's own
+		 * lifetime, so we must not copy it.
+		 * 
+		 * But if key abbreviation is in use than we should copy it from search
+		 * tuple: this is safe (pass-by-value) and extra recalculation can
+		 * spoil statistics calculation.
+		 */
+		if (index->sortKeys->abbrev_converter)
+		{
+			entry->isnull1 = search->isnull1;
+			entry->key1 = search->key1;
+		}
+		else
+		{
+			SortSupport sortKey = &index->sortKeys[0];
+			entry->key1 = mintup_getattr(entry->tuple, index->tupDesc,
+										 sortKey->ssup_attno, &entry->isnull1);
+		}
+
+		index->ntuples++;
+
+		*is_new = true;
+		need_insert = true;
+		insert_pointer = NULL;
+		insert_entry = entry;
+	}
+	else
+	{
+		TupleIndexNode child_split_node = NULL;
+		TupleIndexEntry child_split_entry;
+
+		entry = tuple_index_node_lookup(index, node->pointers[idx], level - 1,
+										search, is_new,
+										&child_split_node, &child_split_entry);
+		if (entry == NULL)
+			return NULL;
+
+		if (child_split_node != NULL)
+		{
+			need_insert = true;
+			insert_pointer = child_split_node;
+			insert_entry = child_split_entry;
+		}
+		else
+			need_insert = false;
+	}
+	
+	if (need_insert)
+	{
+		Assert(insert_entry != NULL);
+
+		if (node->ntuples == TUPLE_INDEX_NODE_MAX_ENTRIES)
+		{
+			TupleIndexNode split_node;
+			TupleIndexEntry split_entry;
+
+			tuple_index_insert_split(index, node, is_leaf, idx,
+									 &split_node, &split_entry);
+
+			/* adjust insertion index if tuple is inserted to the splitted page */
+			if (node->ntuples < idx)
+			{
+				/* keep split tuple for leaf nodes and remove for internal */
+				if (is_leaf)
+					idx -= node->ntuples;
+				else
+					idx -= node->ntuples + 1;
+
+				node = split_node;
+			}
+
+			*split_node_out = split_node;
+			*split_entry_out = split_entry;
+		}
+
+		Assert(idx >= 0);
+		tuple_index_node_insert_at(node, is_leaf, idx, insert_entry, insert_pointer);
+	}
+
+	return entry;
+}
+
+static void
+remove_index_abbreviations(TupleIndex index)
+{
+	TupleIndexIteratorData iter;
+	TupleIndexEntry entry;
+	SortSupport sortKey = &index->sortKeys[0];
+
+	sortKey->comparator = sortKey->abbrev_full_comparator;
+	sortKey->abbrev_converter = NULL;
+	sortKey->abbrev_abort = NULL;
+	sortKey->abbrev_full_comparator = NULL;
+
+	/* now traverse all index entries and convert all existing keys */
+	InitTupleIndexIterator(index, &iter);
+	while ((entry = TupleIndexIteratorNext(&iter)) != NULL)
+		entry->key1 = mintup_getattr(entry->tuple, index->tupDesc,
+									 sortKey->ssup_attno, &entry->isnull1);
+}
+
+static inline void
+prepare_search_index_tuple(TupleIndex index, TupleTableSlot *slot,
+						   TupleIndexSearchEntry entry)
+{
+	SortSupport	sortKey;
+
+	sortKey = &index->sortKeys[0];
+
+	entry->slot = slot;
+	entry->key1 = slot_getattr(slot, sortKey->ssup_attno, &entry->isnull1);
+
+	/* NULL can not be abbreviated */
+	if (entry->isnull1)
+		return;
+
+	/* abbreviation is not used */
+	if (!sortKey->abbrev_converter)
+		return;
+
+	/* check if abbreviation should be removed */
+	if (index->abbrevNext <= index->ntuples)
+	{
+		index->abbrevNext *= 2;
+
+		if (sortKey->abbrev_abort(index->ntuples, sortKey))
+		{
+			remove_index_abbreviations(index);
+			return;
+		}
+	}
+
+	entry->key1 = sortKey->abbrev_converter(entry->key1, sortKey);
+}
+
+TupleIndexEntry
+TupleIndexLookup(TupleIndex index, TupleTableSlot *searchslot, bool *is_new)
+{
+	TupleIndexEntry entry;
+	TupleIndexSearchEntryData search_entry;
+	TupleIndexNode split_node = NULL;
+	TupleIndexEntry split_entry;
+	TupleIndexNode new_root;
+
+	prepare_search_index_tuple(index, searchslot, &search_entry);
+
+	entry = tuple_index_node_lookup(index, index->root, index->height,
+									&search_entry, is_new, &split_node, &split_entry);
+
+	if (entry == NULL)
+		return NULL;
+
+	if (split_node == NULL)
+		return entry;
+
+	/* root split */
+	new_root = AllocInternalIndexNode(index);
+	new_root->ntuples = 1;
+	new_root->tuples[0] = split_entry;
+	new_root->pointers[0] = index->root;
+	new_root->pointers[1] = split_node;
+	index->root = new_root;
+	index->height++;
+
+	return entry;
+}
+
+void
+InitTupleIndexIterator(TupleIndex index, TupleIndexIterator iter)
+{
+	TupleIndexNode min_node;
+	int level;
+
+	/* iterate to the left-most node */
+	min_node = index->root;
+	level = index->height;
+	while (level-- > 0)
+		min_node = min_node->pointers[0];
+
+	iter->cur_leaf = min_node;
+	iter->cur_idx = 0;
+}
+
+TupleIndexEntry
+TupleIndexIteratorNext(TupleIndexIterator iter)
+{
+	TupleIndexNode leaf = iter->cur_leaf;
+	TupleIndexEntry tuple;
+
+	if (leaf == NULL)
+		return NULL;
+
+	/* this also handles single empty root node case */
+	if (leaf->ntuples <= iter->cur_idx)
+	{
+		leaf = iter->cur_leaf = IndexLeafNodeGetNext(leaf);
+		if (leaf == NULL)
+			return NULL;
+		iter->cur_idx = 0;
+	}
+
+	tuple = leaf->tuples[iter->cur_idx];
+	iter->cur_idx++;
+	return tuple;
+}
+
+/* 
+ * Construct an empty TupleIndex
+ *
+ * inputDesc: tuple descriptor for input tuples
+ * nkeys: number of columns to be compared (length of next 4 arrays)
+ * attNums: attribute numbers used for grouping in sort order
+ * sortOperators: Oids of sort operator families used for comparisons
+ * sortCollations: collations used for comparisons
+ * nullsFirstFlags: strategy for handling NULL values
+ * additionalsize: size of data that may be stored along with the index entry
+ * 				   used for storing per-trans information during aggregation
+ * metacxt: memory context for TupleIndex itself
+ * tuplecxt: memory context for storing MinimalTuples
+ * nodecxt: memory context for storing index nodes
+ */
+TupleIndex
+BuildTupleIndex(TupleDesc inputDesc,
+				int nkeys,
+				AttrNumber *attNums,
+				Oid *sortOperators,
+				Oid *sortCollations,
+				bool *nullsFirstFlags,
+				Size additionalsize,
+				MemoryContext metacxt,
+				MemoryContext tuplecxt,
+				MemoryContext nodecxt)
+{
+	TupleIndex index;
+	MemoryContext oldcxt;
+
+	Assert(nkeys > 0);
+
+	additionalsize = MAXALIGN(additionalsize);
+
+	oldcxt = MemoryContextSwitchTo(metacxt);
+
+	index = (TupleIndex) palloc(sizeof(TupleIndexData));
+	index->tuplecxt = tuplecxt;
+	index->nodecxt = nodecxt;
+	index->additionalsize = additionalsize;
+	index->tupDesc = CreateTupleDescCopy(inputDesc);
+	index->root = AllocLeafIndexNode(index, NULL);
+	index->ntuples = 0;
+	index->height = 0;
+
+	index->nkeys = nkeys;
+	index->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));
+
+	for (int i = 0; i < nkeys; ++i)
+	{
+		SortSupport sortKey = &index->sortKeys[i];
+
+		Assert(AttributeNumberIsValid(attNums[i]));
+		Assert(OidIsValid(sortOperators[i]));
+
+		sortKey->ssup_cxt = CurrentMemoryContext;
+		sortKey->ssup_collation = sortCollations[i];
+		sortKey->ssup_nulls_first = nullsFirstFlags[i];
+		sortKey->ssup_attno = attNums[i];
+		/* abbreviation applies only for the first key */
+		sortKey->abbreviate = i == 0;
+
+		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
+	}
+
+	/* Update abbreviation information */
+	if (index->sortKeys[0].abbrev_converter != NULL)
+	{
+		index->abbrevUsed = true;
+		index->abbrevNext = 10;
+		index->abbrevSortOp = sortOperators[0];
+	}
+	else
+		index->abbrevUsed = false;
+
+	MemoryContextSwitchTo(oldcxt);
+	return index;
+}
+
+/* 
+ * Resets contents of the index to be empty, preserving all the non-content
+ * state.
+ */
+void
+ResetTupleIndex(TupleIndex index)
+{
+	SortSupport ssup;
+
+	/* by this time indexcxt must be reset by the caller */
+	index->root = AllocLeafIndexNode(index, NULL);
+	index->height = 0;
+	index->ntuples = 0;
+	
+	if (!index->abbrevUsed)
+		return;
+
+	/* 
+	 * If key abbreviation is used then we must reset it's state.
+	 * All fields in SortSupport are already setup, but we should clean
+	 * some fields to make it look just if we setup this for the first time.
+	 */
+	ssup = &index->sortKeys[0];
+	ssup->comparator = NULL;
+	PrepareSortSupportFromOrderingOp(index->abbrevSortOp, ssup);
+}
+
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 7cd6a49309f..90c8ba7c779 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -198,6 +198,71 @@ TupleHashEntryGetAdditional(TupleHashTable hashtable, TupleHashEntry entry)
 }
 #endif
 
+extern TupleIndex BuildTupleIndex(TupleDesc inputDesc,
+								  int nkeys,
+								  AttrNumber *attNums,
+								  Oid *sortOperators,
+								  Oid *sortCollations,
+								  bool *nullsFirstFlags,
+								  Size additionalsize,
+								  MemoryContext metacxt,
+								  MemoryContext tablecxt,
+								  MemoryContext nodecxt);
+extern TupleIndexEntry TupleIndexLookup(TupleIndex index, TupleTableSlot *search,
+		  								bool *is_new);
+extern void ResetTupleIndex(TupleIndex index);
+
+/* 
+ * Start iteration over tuples in index. Supports only ascending direction.
+ * During iterations no modifications are allowed, because it can break iterator.
+ */
+extern void	InitTupleIndexIterator(TupleIndex index, TupleIndexIterator iter);
+extern TupleIndexEntry TupleIndexIteratorNext(TupleIndexIterator iter);
+static inline void
+ResetTupleIndexIterator(TupleIndex index, TupleIndexIterator iter)
+{
+	InitTupleIndexIterator(index, iter);
+}
+
+#ifndef FRONTEND
+
+/* 
+ * Return size of the index entry. Useful for estimating memory usage.
+ */
+static inline size_t
+TupleIndexEntrySize(void)
+{
+	return sizeof(TupleIndexEntryData);
+}
+
+/* 
+ * Get a pointer to the additional space allocated for this entry. The
+ * memory will be maxaligned and zeroed.
+ * 
+ * The amount of space available is the additionalsize requested in the call
+ * to BuildTupleIndex(). If additionalsize was specified as zero, return
+ * NULL.
+ */
+static inline void *
+TupleIndexEntryGetAdditional(TupleIndex index, TupleIndexEntry entry)
+{
+if (index->additionalsize > 0)
+	return (char *) (entry->tuple) - index->additionalsize;
+else
+	return NULL;
+}
+
+/* 
+ * Return tuple from index entry
+ */
+static inline MinimalTuple
+TupleIndexEntryGetMinimalTuple(TupleIndexEntry entry)
+{
+	return entry->tuple;
+}
+
+#endif
+
 /*
  * prototypes from functions in execJunk.c
  */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 64ff6996431..c9b69a96b26 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -900,7 +900,91 @@ typedef tuplehash_iterator TupleHashIterator;
 #define ScanTupleHashTable(htable, iter) \
 	tuplehash_iterate(htable->hashtab, iter)
 
+/* ---------------------------------------------------------------
+ * 				Tuple Btree index
+ *
+ * All-in-memory tuple Btree index used for grouping and aggregating.
+ * ---------------------------------------------------------------
+ */
+
+/* 
+ * Representation of tuple in index.  It stores both tuple and
+ * first key information.  If key abbreviation is used, then this
+ * first key stores abbreviated key.
+ */
+typedef struct TupleIndexEntryData
+{
+	MinimalTuple tuple;	/* actual stored tuple */
+	Datum	key1;		/* value of first key */
+	bool	isnull1;	/* first key is null */
+} TupleIndexEntryData;
+
+typedef TupleIndexEntryData *TupleIndexEntry;
+
+/* 
+ * Btree node of tuple index. Common for both internal and leaf nodes.
+ */
+typedef struct TupleIndexNodeData
+{
+	/* amount of tuples in the node */
+	int ntuples;
 
+/* 
+ * Maximal amount of tuples stored in tuple index node.
+ *
+ * NOTE: use 2^n - 1 count, so all all tuples will fully utilize cache lines
+ *       (except first because of 'ntuples' padding)
+ */
+#define TUPLE_INDEX_NODE_MAX_ENTRIES  63
+
+	/* 
+	 * array of tuples for this page.
+	 * 
+	 * for internal node these are separator keys.
+	 * for leaf nodes actual tuples.
+	 */
+	TupleIndexEntry tuples[TUPLE_INDEX_NODE_MAX_ENTRIES];
+
+	/* 
+	 * for internal nodes this is an array with size
+	 * TUPLE_INDEX_NODE_MAX_ENTRIES + 1 - pointers to nodes below.
+	 * 
+	 * for leaf nodes this is an array of 1 element - pointer to sibling
+	 * node required for iteration
+	 */
+	struct TupleIndexNodeData *pointers[FLEXIBLE_ARRAY_MEMBER];
+} TupleIndexNodeData;
+
+typedef TupleIndexNodeData *TupleIndexNode;
+
+typedef struct TupleIndexData
+{
+	TupleDesc	tupDesc;		/* descriptor for stored tuples */
+	TupleIndexNode root;		/* root of the tree */
+	int		height;				/* current tree height */
+	int		ntuples;			/* number of tuples in index */
+	int		nkeys;				/* amount of keys in tuple */
+	SortSupport	sortKeys;		/* support functions for key comparison */
+	MemoryContext	tuplecxt;	/* memory context containing tuples */
+	MemoryContext	nodecxt;	/* memory context containing index nodes */
+	Size	additionalsize;		/* size of additional data for tuple */
+	int		abbrevNext;			/* next time we should check abbreviation 
+									* optimization efficiency */
+	bool	abbrevUsed;			/* true if key abbreviation optimization
+									* was ever used */
+	Oid		abbrevSortOp;		/* sort operator for first key */
+} TupleIndexData;
+
+typedef struct TupleIndexData *TupleIndex;
+
+typedef struct TupleIndexIteratorData
+{
+	TupleIndexNode	cur_leaf;	/* current leaf node */
+	OffsetNumber	cur_idx;	/* index of tuple to return next */
+} TupleIndexIteratorData;
+
+typedef TupleIndexIteratorData *TupleIndexIterator;
+	
 /* ----------------------------------------------------------------
  *				 Expression State Nodes
  *
-- 
2.43.0

v3-0002-introduce-AGG_INDEX-grouping-strategy-node.patchtext/x-patch; charset=UTF-8; name=v3-0002-introduce-AGG_INDEX-grouping-strategy-node.patchDownload

From 91a233a6df2c59729d107cd73b1b11a2bc545006 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 16:41:58 +0300
Subject: [PATCH v3 2/5] introduce AGG_INDEX grouping strategy node

AGG_INDEX is a new grouping strategy that builds in-memory index and use
it for grouping. The main advantage of this approach is that output is
ordered by grouping columns and if there are any ORDER BY specified,
then it will use this to build grouping/sorting columns.

For index it uses B+tree which was implemented in previous commit. And
overall it's implementation is very close to AGG_HASHED:

- maintain in-memory grouping structure
- track memory consuption
- if memory limit reached spill data to disk in batches (using hash of
  key columns)
- hash batches are processed one after another and for each batch fill
  new in-memory structure

For this reason many code logic is generalized to support both index and
hash implementations: function generalization using boolean arguments
(i.e. 'ishash'), rename spill logic members in AggState with prefix
'spill_' instead of 'hash_', etc.

Most differences are in spill logic: to preserve sort order in case of disk
spill we must dump all indexes to disk to create sorted runs and perform
final external merge.

One problem is external merge. It's adapted from tuplesort.c - introduce
new operational mode - tuplemerge (with it's own prefix). Internally we
just setup state accordingly and process as earlier without any
significant code changes.

Another problem is what tuples to save into sorted runs. We decided to
store tuples after projection (when it's aggregates are finalized),
because internal transition info is represented by value/isnull/novalue
tripple (in AggStatePerGroupData) which is quiet hard to serialize and
handle, but actually, after projection all group by attributes are
saved, so we can access them during merge. Also, projection applies
filter, so it can discard some tuples.
---
 src/backend/executor/execExpr.c            |   31 +-
 src/backend/executor/nodeAgg.c             | 1379 +++++++++++++++++---
 src/backend/utils/sort/tuplesort.c         |  209 ++-
 src/backend/utils/sort/tuplesortvariants.c |  105 ++
 src/include/executor/executor.h            |   10 +-
 src/include/executor/nodeAgg.h             |   33 +-
 src/include/nodes/execnodes.h              |   61 +-
 src/include/nodes/nodes.h                  |    1 +
 src/include/nodes/plannodes.h              |    2 +-
 src/include/utils/tuplesort.h              |   17 +-
 10 files changed, 1620 insertions(+), 228 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index c35744b105e..117d7ba31d0 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -94,7 +94,7 @@ static void ExecInitCoerceToDomain(ExprEvalStep *scratch, CoerceToDomain *ctest,
 static void ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 								  ExprEvalStep *scratch,
 								  FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
-								  int transno, int setno, int setoff, bool ishash,
+								  int transno, int setno, int setoff, int strategy,
 								  bool nullcheck);
 static void ExecInitJsonExpr(JsonExpr *jsexpr, ExprState *state,
 							 Datum *resv, bool *resnull,
@@ -3667,7 +3667,7 @@ ExecInitCoerceToDomain(ExprEvalStep *scratch, CoerceToDomain *ctest,
  */
 ExprState *
 ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
-				  bool doSort, bool doHash, bool nullcheck)
+				  int groupStrategy, bool nullcheck)
 {
 	ExprState  *state = makeNode(ExprState);
 	PlanState  *parent = &aggstate->ss.ps;
@@ -3925,7 +3925,7 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 		 * grouping set). Do so for both sort and hash based computations, as
 		 * applicable.
 		 */
-		if (doSort)
+		if (groupStrategy & GROUPING_STRATEGY_SORT)
 		{
 			int			processGroupingSets = Max(phase->numsets, 1);
 			int			setoff = 0;
@@ -3933,13 +3933,13 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 			for (int setno = 0; setno < processGroupingSets; setno++)
 			{
 				ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
-									  pertrans, transno, setno, setoff, false,
-									  nullcheck);
+									  pertrans, transno, setno, setoff,
+									  GROUPING_STRATEGY_SORT, nullcheck);
 				setoff++;
 			}
 		}
 
-		if (doHash)
+		if (groupStrategy & GROUPING_STRATEGY_HASH)
 		{
 			int			numHashes = aggstate->num_hashes;
 			int			setoff;
@@ -3953,12 +3953,19 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 			for (int setno = 0; setno < numHashes; setno++)
 			{
 				ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
-									  pertrans, transno, setno, setoff, true,
-									  nullcheck);
+									  pertrans, transno, setno, setoff,
+									  GROUPING_STRATEGY_HASH, nullcheck);
 				setoff++;
 			}
 		}
 
+		if (groupStrategy & GROUPING_STRATEGY_INDEX)
+		{
+			ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
+								  pertrans, transno, 0, 0,
+								  GROUPING_STRATEGY_INDEX, nullcheck);
+		}
+
 		/* adjust early bail out jump target(s) */
 		foreach(bail, adjust_bailout)
 		{
@@ -4011,16 +4018,18 @@ static void
 ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 					  ExprEvalStep *scratch,
 					  FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
-					  int transno, int setno, int setoff, bool ishash,
+					  int transno, int setno, int setoff, int strategy,
 					  bool nullcheck)
 {
 	ExprContext *aggcontext;
 	int			adjust_jumpnull = -1;
 
-	if (ishash)
+	if (strategy & GROUPING_STRATEGY_HASH)
 		aggcontext = aggstate->hashcontext;
-	else
+	else if (strategy & GROUPING_STRATEGY_SORT)
 		aggcontext = aggstate->aggcontexts[setno];
+	else
+		aggcontext = aggstate->indexcontext;
 
 	/* add check for NULL pointer? */
 	if (nullcheck)
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index a18556f62ec..1284c928c50 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -364,7 +364,7 @@ typedef struct FindColsContext
 	Bitmapset  *unaggregated;	/* other column references */
 } FindColsContext;
 
-static void select_current_set(AggState *aggstate, int setno, bool is_hash);
+static void select_current_set(AggState *aggstate, int setno, int strategy);
 static void initialize_phase(AggState *aggstate, int newphase);
 static TupleTableSlot *fetch_input_tuple(AggState *aggstate);
 static void initialize_aggregates(AggState *aggstate,
@@ -403,8 +403,8 @@ static void find_cols(AggState *aggstate, Bitmapset **aggregated,
 static bool find_cols_walker(Node *node, FindColsContext *context);
 static void build_hash_tables(AggState *aggstate);
 static void build_hash_table(AggState *aggstate, int setno, double nbuckets);
-static void hashagg_recompile_expressions(AggState *aggstate, bool minslot,
-										  bool nullcheck);
+static void agg_recompile_expressions(AggState *aggstate, bool minslot,
+									  bool nullcheck);
 static void hash_create_memory(AggState *aggstate);
 static double hash_choose_num_buckets(double hashentrysize,
 									  double ngroups, Size memory);
@@ -431,13 +431,13 @@ static HashAggBatch *hashagg_batch_new(LogicalTape *input_tape, int setno,
 									   int64 input_tuples, double input_card,
 									   int used_bits);
 static MinimalTuple hashagg_batch_read(HashAggBatch *batch, uint32 *hashp);
-static void hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset,
-							   int used_bits, double input_groups,
-							   double hashentrysize);
-static Size hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
-								TupleTableSlot *inputslot, uint32 hash);
-static void hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill,
-								 int setno);
+static void agg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset,
+						   int used_bits, double input_groups,
+						   double hashentrysize);
+static Size agg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
+							TupleTableSlot *inputslot, uint32 hash);
+static void agg_spill_finish(AggState *aggstate, HashAggSpill *spill,
+							 int setno);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
 static void build_pertrans_for_aggref(AggStatePerTrans pertrans,
 									  AggState *aggstate, EState *estate,
@@ -446,21 +446,27 @@ static void build_pertrans_for_aggref(AggStatePerTrans pertrans,
 									  Oid aggdeserialfn, Datum initValue,
 									  bool initValueIsNull, Oid *inputTypes,
 									  int numArguments);
-
+static void agg_fill_index(AggState *state);
+static TupleTableSlot *agg_retrieve_index(AggState *state);
+static void lookup_index_entries(AggState *state);
+static void indexagg_finish_initial_spills(AggState *aggstate);
+static void index_agg_enter_spill_mode(AggState *aggstate);
 
 /*
  * Select the current grouping set; affects current_set and
  * curaggcontext.
  */
 static void
-select_current_set(AggState *aggstate, int setno, bool is_hash)
+select_current_set(AggState *aggstate, int setno, int strategy)
 {
 	/*
 	 * When changing this, also adapt ExecAggPlainTransByVal() and
 	 * ExecAggPlainTransByRef().
 	 */
-	if (is_hash)
+	if (strategy == GROUPING_STRATEGY_HASH)
 		aggstate->curaggcontext = aggstate->hashcontext;
+	else if (strategy == GROUPING_STRATEGY_INDEX)
+		aggstate->curaggcontext = aggstate->indexcontext;
 	else
 		aggstate->curaggcontext = aggstate->aggcontexts[setno];
 
@@ -680,7 +686,7 @@ initialize_aggregates(AggState *aggstate,
 	{
 		AggStatePerGroup pergroup = pergroups[setno];
 
-		select_current_set(aggstate, setno, false);
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_SORT);
 
 		for (transno = 0; transno < numTrans; transno++)
 		{
@@ -1478,7 +1484,7 @@ build_hash_tables(AggState *aggstate)
 			continue;
 		}
 
-		memory = aggstate->hash_mem_limit / aggstate->num_hashes;
+		memory = aggstate->spill_mem_limit / aggstate->num_hashes;
 
 		/* choose reasonable number of buckets per hashtable */
 		nbuckets = hash_choose_num_buckets(aggstate->hashentrysize,
@@ -1496,7 +1502,7 @@ build_hash_tables(AggState *aggstate)
 		build_hash_table(aggstate, setno, nbuckets);
 	}
 
-	aggstate->hash_ngroups_current = 0;
+	aggstate->spill_ngroups_current = 0;
 }
 
 /*
@@ -1728,7 +1734,7 @@ hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
 }
 
 /*
- * hashagg_recompile_expressions()
+ * agg_recompile_expressions()
  *
  * Identifies the right phase, compiles the right expression given the
  * arguments, and then sets phase->evalfunc to that expression.
@@ -1746,34 +1752,47 @@ hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
  * expressions in the AggStatePerPhase, and reuse when appropriate.
  */
 static void
-hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
+agg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
 {
 	AggStatePerPhase phase;
 	int			i = minslot ? 1 : 0;
 	int			j = nullcheck ? 1 : 0;
 
 	Assert(aggstate->aggstrategy == AGG_HASHED ||
-		   aggstate->aggstrategy == AGG_MIXED);
+		   aggstate->aggstrategy == AGG_MIXED ||
+		   aggstate->aggstrategy == AGG_INDEX);
 
-	if (aggstate->aggstrategy == AGG_HASHED)
-		phase = &aggstate->phases[0];
-	else						/* AGG_MIXED */
+	if (aggstate->aggstrategy == AGG_MIXED)
 		phase = &aggstate->phases[1];
+	else						/* AGG_HASHED or AGG_INDEX */
+		phase = &aggstate->phases[0];
 
 	if (phase->evaltrans_cache[i][j] == NULL)
 	{
 		const TupleTableSlotOps *outerops = aggstate->ss.ps.outerops;
 		bool		outerfixed = aggstate->ss.ps.outeropsfixed;
-		bool		dohash = true;
-		bool		dosort = false;
+		int			strategy = 0;
 
-		/*
-		 * If minslot is true, that means we are processing a spilled batch
-		 * (inside agg_refill_hash_table()), and we must not advance the
-		 * sorted grouping sets.
-		 */
-		if (aggstate->aggstrategy == AGG_MIXED && !minslot)
-			dosort = true;
+		switch (aggstate->aggstrategy)
+		{
+			case AGG_MIXED:
+				/*
+				 * If minslot is true, that means we are processing a spilled batch
+				 * (inside agg_refill_hash_table()), and we must not advance the
+				 * sorted grouping sets.
+				 */
+				if (!minslot)
+					strategy |= GROUPING_STRATEGY_SORT;
+				/* FALLTHROUGH */
+			case AGG_HASHED:
+				strategy |= GROUPING_STRATEGY_HASH;
+				break;
+			case AGG_INDEX:
+				strategy |= GROUPING_STRATEGY_INDEX;
+				break;	
+			default:
+				Assert(false);
+		}
 
 		/* temporarily change the outerops while compiling the expression */
 		if (minslot)
@@ -1783,8 +1802,7 @@ hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
 		}
 
 		phase->evaltrans_cache[i][j] = ExecBuildAggTrans(aggstate, phase,
-														 dosort, dohash,
-														 nullcheck);
+														 strategy, nullcheck);
 
 		/* change back */
 		aggstate->ss.ps.outerops = outerops;
@@ -1803,9 +1821,9 @@ hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
  * substantially larger than the initial value.
  */
 void
-hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
-					Size *mem_limit, uint64 *ngroups_limit,
-					int *num_partitions)
+agg_set_limits(double hashentrysize, double input_groups, int used_bits,
+			   Size *mem_limit, uint64 *ngroups_limit,
+			   int *num_partitions)
 {
 	int			npartitions;
 	Size		partition_mem;
@@ -1853,6 +1871,18 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
 		*ngroups_limit = 1;
 }
 
+static inline bool
+agg_spill_required(AggState *aggstate, Size total_mem)
+{
+	/*
+	 * Don't spill unless there's at least one group in the hash table so we
+	 * can be sure to make progress even in edge cases.
+	 */
+	return aggstate->spill_ngroups_current > 0 &&
+			(total_mem > aggstate->spill_mem_limit ||
+			 aggstate->spill_ngroups_current > aggstate->spill_ngroups_limit);
+}
+
 /*
  * hash_agg_check_limits
  *
@@ -1863,7 +1893,6 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
 static void
 hash_agg_check_limits(AggState *aggstate)
 {
-	uint64		ngroups = aggstate->hash_ngroups_current;
 	Size		meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt,
 													 true);
 	Size		entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt,
@@ -1874,7 +1903,7 @@ hash_agg_check_limits(AggState *aggstate)
 	bool		do_spill = false;
 
 #ifdef USE_INJECTION_POINTS
-	if (ngroups >= 1000)
+	if (aggstate->spill_ngroups_current >= 1000)
 	{
 		if (IS_INJECTION_POINT_ATTACHED("hash-aggregate-spill-1000"))
 		{
@@ -1888,9 +1917,7 @@ hash_agg_check_limits(AggState *aggstate)
 	 * Don't spill unless there's at least one group in the hash table so we
 	 * can be sure to make progress even in edge cases.
 	 */
-	if (aggstate->hash_ngroups_current > 0 &&
-		(total_mem > aggstate->hash_mem_limit ||
-		 ngroups > aggstate->hash_ngroups_limit))
+	if (agg_spill_required(aggstate, total_mem))
 	{
 		do_spill = true;
 	}
@@ -1899,68 +1926,150 @@ hash_agg_check_limits(AggState *aggstate)
 		hash_agg_enter_spill_mode(aggstate);
 }
 
+static void
+index_agg_check_limits(AggState *aggstate)
+{
+	Size		meta_mem = MemoryContextMemAllocated(aggstate->index_metacxt,
+													 true);
+	Size		node_mem = MemoryContextMemAllocated(aggstate->index_nodecxt,
+													 true);
+	Size		entry_mem = MemoryContextMemAllocated(aggstate->index_entrycxt,
+													  true);
+	Size		tval_mem = MemoryContextMemAllocated(aggstate->indexcontext->ecxt_per_tuple_memory,
+													 true);
+	Size		total_mem = meta_mem + node_mem + entry_mem + tval_mem;
+	bool		do_spill = false;
+
+#ifdef USE_INJECTION_POINTS
+	if (aggstate->spill_ngroups_current >= 1000)
+	{
+		if (IS_INJECTION_POINT_ATTACHED("index-aggregate-spill-1000"))
+		{
+			do_spill = true;
+			INJECTION_POINT_CACHED("index-aggregate-spill-1000", NULL);
+		}
+	}
+#endif
+
+	if (agg_spill_required(aggstate, total_mem))
+	{
+		do_spill = true;
+	}
+
+	if (do_spill)
+		index_agg_enter_spill_mode(aggstate);
+}
+
 /*
  * Enter "spill mode", meaning that no new groups are added to any of the hash
  * tables. Tuples that would create a new group are instead spilled, and
  * processed later.
  */
-static void
-hash_agg_enter_spill_mode(AggState *aggstate)
+static inline void
+agg_enter_spill_mode(AggState *aggstate, bool ishash)
 {
-	INJECTION_POINT("hash-aggregate-enter-spill-mode", NULL);
-	aggstate->hash_spill_mode = true;
-	hashagg_recompile_expressions(aggstate, aggstate->table_filled, true);
-
-	if (!aggstate->hash_ever_spilled)
+	if (ishash)
 	{
-		Assert(aggstate->hash_tapeset == NULL);
-		Assert(aggstate->hash_spills == NULL);
-
-		aggstate->hash_ever_spilled = true;
-
-		aggstate->hash_tapeset = LogicalTapeSetCreate(true, NULL, -1);
+		INJECTION_POINT("hash-aggregate-enter-spill-mode", NULL);
+		aggstate->spill_mode = true;
+		agg_recompile_expressions(aggstate, aggstate->table_filled, true);	
+	}
+	else
+	{
+		INJECTION_POINT("index-aggregate-enter-spill-mode", NULL);
+		aggstate->spill_mode = true;
+		agg_recompile_expressions(aggstate, aggstate->index_filled, true);
+	}
+	
+	if (!aggstate->spill_ever_happened)
+	{
+		Assert(aggstate->spill_tapeset == NULL);
+		Assert(aggstate->spills == NULL);
 
-		aggstate->hash_spills = palloc_array(HashAggSpill, aggstate->num_hashes);
+		aggstate->spill_ever_happened = true;
+		aggstate->spill_tapeset = LogicalTapeSetCreate(true, NULL, -1);
 
-		for (int setno = 0; setno < aggstate->num_hashes; setno++)
+		if (ishash)
 		{
-			AggStatePerHash perhash = &aggstate->perhash[setno];
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
-
-			hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
+			aggstate->spills = palloc_array(HashAggSpill, aggstate->num_hashes);
+	
+			for (int setno = 0; setno < aggstate->num_hashes; setno++)
+			{
+				AggStatePerHash perhash = &aggstate->perhash[setno];
+				HashAggSpill *spill = &aggstate->spills[setno];
+	
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
 							   perhash->aggnode->numGroups,
 							   aggstate->hashentrysize);
+			}
+		}
+		else
+		{
+			aggstate->spills = palloc(sizeof(HashAggSpill));
+			agg_spill_init(aggstate->spills, aggstate->spill_tapeset, 0,
+						   aggstate->perindex->aggnode->numGroups,
+						   aggstate->hashentrysize);
 		}
 	}
 }
 
+static void
+hash_agg_enter_spill_mode(AggState *aggstate)
+{
+	agg_enter_spill_mode(aggstate, true);
+}
+
+static void
+index_agg_enter_spill_mode(AggState *aggstate)
+{
+	agg_enter_spill_mode(aggstate, false);
+}
+
 /*
  * Update metrics after filling the hash table.
  *
  * If reading from the outer plan, from_tape should be false; if reading from
  * another tape, from_tape should be true.
  */
-static void
-hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+static inline void
+agg_update_spill_metrics(AggState *aggstate, bool from_tape, int npartitions, bool ishash)
 {
 	Size		meta_mem;
 	Size		entry_mem;
-	Size		hashkey_mem;
+	Size		key_mem;
 	Size		buffer_mem;
 	Size		total_mem;
 
 	if (aggstate->aggstrategy != AGG_MIXED &&
-		aggstate->aggstrategy != AGG_HASHED)
+		aggstate->aggstrategy != AGG_HASHED &&
+		aggstate->aggstrategy != AGG_INDEX)
 		return;
 
-	/* memory for the hash table itself */
-	meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt, true);
-
-	/* memory for hash entries */
-	entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt, true);
-
-	/* memory for byref transition states */
-	hashkey_mem = MemoryContextMemAllocated(aggstate->hashcontext->ecxt_per_tuple_memory, true);
+	if (ishash)
+	{
+		/* memory for the hash table itself */
+		meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt, true);
+		
+		/* memory for hash entries */
+		entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt, true);
+		
+		/* memory for byref transition states */
+		key_mem = MemoryContextMemAllocated(aggstate->hashcontext->ecxt_per_tuple_memory, true);
+	}
+	else
+	{
+		/* memory for the index itself */
+		meta_mem = MemoryContextMemAllocated(aggstate->index_metacxt, true);
+		
+		/* memory for the index nodes */
+		meta_mem += MemoryContextMemAllocated(aggstate->index_nodecxt, true);
+		
+		/* memory for index entries */
+		entry_mem = MemoryContextMemAllocated(aggstate->index_entrycxt, true);
+
+		/* memory for byref transition states */
+		key_mem = MemoryContextMemAllocated(aggstate->indexcontext->ecxt_per_tuple_memory, true);
+	}
 
 	/* memory for read/write tape buffers, if spilled */
 	buffer_mem = npartitions * HASHAGG_WRITE_BUFFER_SIZE;
@@ -1968,28 +2077,49 @@ hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
 		buffer_mem += HASHAGG_READ_BUFFER_SIZE;
 
 	/* update peak mem */
-	total_mem = meta_mem + entry_mem + hashkey_mem + buffer_mem;
-	if (total_mem > aggstate->hash_mem_peak)
-		aggstate->hash_mem_peak = total_mem;
+	total_mem = meta_mem + entry_mem + key_mem + buffer_mem;
+	if (total_mem > aggstate->spill_mem_peak)
+		aggstate->spill_mem_peak = total_mem;
 
 	/* update disk usage */
-	if (aggstate->hash_tapeset != NULL)
+	if (aggstate->spill_tapeset != NULL)
 	{
-		uint64		disk_used = LogicalTapeSetBlocks(aggstate->hash_tapeset) * (BLCKSZ / 1024);
+		uint64		disk_used = LogicalTapeSetBlocks(aggstate->spill_tapeset) * (BLCKSZ / 1024);
 
-		if (aggstate->hash_disk_used < disk_used)
-			aggstate->hash_disk_used = disk_used;
+		if (aggstate->spill_disk_used < disk_used)
+			aggstate->spill_disk_used = disk_used;
 	}
 
 	/* update hashentrysize estimate based on contents */
-	if (aggstate->hash_ngroups_current > 0)
+	if (aggstate->spill_ngroups_current > 0)
 	{
-		aggstate->hashentrysize =
-			TupleHashEntrySize() +
-			(hashkey_mem / (double) aggstate->hash_ngroups_current);
+		if (ishash)
+		{
+			aggstate->hashentrysize =
+				TupleHashEntrySize() +
+				(key_mem / (double) aggstate->spill_ngroups_current);
+		}
+		else
+		{
+			/* index stores MinimalTuples directly without any wrapper */
+			aggstate->hashentrysize = 
+				(key_mem / (double) aggstate->spill_ngroups_current);
+		}
 	}
 }
 
+static void
+hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+{
+	agg_update_spill_metrics(aggstate, from_tape, npartitions, true);
+}
+
+static void
+index_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+{
+	agg_update_spill_metrics(aggstate, from_tape, npartitions, false);
+}
+
 /*
  * Create memory contexts used for hash aggregation.
  */
@@ -2048,6 +2178,33 @@ hash_create_memory(AggState *aggstate)
 
 }
 
+/*
+ * Create memory contexts used for index aggregation.
+ */
+static void
+index_create_memory(AggState *aggstate)
+{
+	Size maxBlockSize = ALLOCSET_DEFAULT_MAXSIZE;
+	
+	aggstate->indexcontext = CreateWorkExprContext(aggstate->ss.ps.state);
+	
+	aggstate->index_metacxt = AllocSetContextCreate(aggstate->ss.ps.state->es_query_cxt,
+													"IndexAgg meta context",
+													ALLOCSET_DEFAULT_SIZES);
+	aggstate->index_nodecxt = BumpContextCreate(aggstate->ss.ps.state->es_query_cxt,
+												"IndexAgg node context",
+												ALLOCSET_SMALL_SIZES);
+
+	maxBlockSize = pg_prevpower2_size_t(work_mem * (Size) 1024 / 16);
+	maxBlockSize = Min(maxBlockSize, ALLOCSET_DEFAULT_MAXSIZE);
+	maxBlockSize = Max(maxBlockSize, ALLOCSET_DEFAULT_INITSIZE);
+	aggstate->index_entrycxt = AllocSetContextCreate(aggstate->ss.ps.state->es_query_cxt,
+												"IndexAgg table context",
+												ALLOCSET_DEFAULT_MINSIZE,
+												ALLOCSET_DEFAULT_INITSIZE,
+												maxBlockSize);
+}
+
 /*
  * Choose a reasonable number of buckets for the initial hash table size.
  */
@@ -2141,7 +2298,7 @@ initialize_hash_entry(AggState *aggstate, TupleHashTable hashtable,
 	AggStatePerGroup pergroup;
 	int			transno;
 
-	aggstate->hash_ngroups_current++;
+	aggstate->spill_ngroups_current++;
 	hash_agg_check_limits(aggstate);
 
 	/* no need to allocate or initialize per-group state */
@@ -2196,9 +2353,9 @@ lookup_hash_entries(AggState *aggstate)
 		bool	   *p_isnew;
 
 		/* if hash table already spilled, don't create new entries */
-		p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
 
-		select_current_set(aggstate, setno, true);
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_HASH);
 		prepare_hash_slot(perhash,
 						  outerslot,
 						  hashslot);
@@ -2214,15 +2371,15 @@ lookup_hash_entries(AggState *aggstate)
 		}
 		else
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 			TupleTableSlot *slot = aggstate->tmpcontext->ecxt_outertuple;
 
 			if (spill->partitions == NULL)
-				hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
-								   perhash->aggnode->numGroups,
-								   aggstate->hashentrysize);
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
+							   perhash->aggnode->numGroups,
+							   aggstate->hashentrysize);
 
-			hashagg_spill_tuple(aggstate, spill, slot, hash);
+			agg_spill_tuple(aggstate, spill, slot, hash);
 			pergroup[setno] = NULL;
 		}
 	}
@@ -2265,6 +2422,12 @@ ExecAgg(PlanState *pstate)
 			case AGG_SORTED:
 				result = agg_retrieve_direct(node);
 				break;
+			case AGG_INDEX:
+				if (!node->index_filled)
+					agg_fill_index(node);
+
+				result = agg_retrieve_index(node);
+				break;
 		}
 
 		if (!TupIsNull(result))
@@ -2381,7 +2544,7 @@ agg_retrieve_direct(AggState *aggstate)
 				aggstate->table_filled = true;
 				ResetTupleHashIterator(aggstate->perhash[0].hashtable,
 									   &aggstate->perhash[0].hashiter);
-				select_current_set(aggstate, 0, true);
+				select_current_set(aggstate, 0, GROUPING_STRATEGY_HASH);
 				return agg_retrieve_hash_table(aggstate);
 			}
 			else
@@ -2601,7 +2764,7 @@ agg_retrieve_direct(AggState *aggstate)
 
 		prepare_projection_slot(aggstate, econtext->ecxt_outertuple, currentSet);
 
-		select_current_set(aggstate, currentSet, false);
+		select_current_set(aggstate, currentSet, GROUPING_STRATEGY_SORT);
 
 		finalize_aggregates(aggstate,
 							peragg,
@@ -2683,19 +2846,19 @@ agg_refill_hash_table(AggState *aggstate)
 	HashAggBatch *batch;
 	AggStatePerHash perhash;
 	HashAggSpill spill;
-	LogicalTapeSet *tapeset = aggstate->hash_tapeset;
+	LogicalTapeSet *tapeset = aggstate->spill_tapeset;
 	bool		spill_initialized = false;
 
-	if (aggstate->hash_batches == NIL)
+	if (aggstate->spill_batches == NIL)
 		return false;
 
 	/* hash_batches is a stack, with the top item at the end of the list */
-	batch = llast(aggstate->hash_batches);
-	aggstate->hash_batches = list_delete_last(aggstate->hash_batches);
+	batch = llast(aggstate->spill_batches);
+	aggstate->spill_batches = list_delete_last(aggstate->spill_batches);
 
-	hash_agg_set_limits(aggstate->hashentrysize, batch->input_card,
-						batch->used_bits, &aggstate->hash_mem_limit,
-						&aggstate->hash_ngroups_limit, NULL);
+	agg_set_limits(aggstate->hashentrysize, batch->input_card,
+				   batch->used_bits, &aggstate->spill_mem_limit,
+				   &aggstate->spill_ngroups_limit, NULL);
 
 	/*
 	 * Each batch only processes one grouping set; set the rest to NULL so
@@ -2712,7 +2875,7 @@ agg_refill_hash_table(AggState *aggstate)
 	for (int setno = 0; setno < aggstate->num_hashes; setno++)
 		ResetTupleHashTable(aggstate->perhash[setno].hashtable);
 
-	aggstate->hash_ngroups_current = 0;
+	aggstate->spill_ngroups_current = 0;
 
 	/*
 	 * In AGG_MIXED mode, hash aggregation happens in phase 1 and the output
@@ -2726,7 +2889,7 @@ agg_refill_hash_table(AggState *aggstate)
 		aggstate->phase = &aggstate->phases[aggstate->current_phase];
 	}
 
-	select_current_set(aggstate, batch->setno, true);
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_HASH);
 
 	perhash = &aggstate->perhash[aggstate->current_set];
 
@@ -2737,19 +2900,19 @@ agg_refill_hash_table(AggState *aggstate)
 	 * We still need the NULL check, because we are only processing one
 	 * grouping set at a time and the rest will be NULL.
 	 */
-	hashagg_recompile_expressions(aggstate, true, true);
+	agg_recompile_expressions(aggstate, true, true);
 
 	INJECTION_POINT("hash-aggregate-process-batch", NULL);
 	for (;;)
 	{
-		TupleTableSlot *spillslot = aggstate->hash_spill_rslot;
+		TupleTableSlot *spillslot = aggstate->spill_rslot;
 		TupleTableSlot *hashslot = perhash->hashslot;
 		TupleHashTable hashtable = perhash->hashtable;
 		TupleHashEntry entry;
 		MinimalTuple tuple;
 		uint32		hash;
 		bool		isnew = false;
-		bool	   *p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
+		bool	   *p_isnew = aggstate->spill_mode ? NULL : &isnew;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2782,11 +2945,11 @@ agg_refill_hash_table(AggState *aggstate)
 				 * that we don't assign tapes that will never be used.
 				 */
 				spill_initialized = true;
-				hashagg_spill_init(&spill, tapeset, batch->used_bits,
-								   batch->input_card, aggstate->hashentrysize);
+				agg_spill_init(&spill, tapeset, batch->used_bits,
+							   batch->input_card, aggstate->hashentrysize);
 			}
 			/* no memory for a new group, spill */
-			hashagg_spill_tuple(aggstate, &spill, spillslot, hash);
+			agg_spill_tuple(aggstate, &spill, spillslot, hash);
 
 			aggstate->hash_pergroup[batch->setno] = NULL;
 		}
@@ -2806,16 +2969,16 @@ agg_refill_hash_table(AggState *aggstate)
 
 	if (spill_initialized)
 	{
-		hashagg_spill_finish(aggstate, &spill, batch->setno);
+		agg_spill_finish(aggstate, &spill, batch->setno);
 		hash_agg_update_metrics(aggstate, true, spill.npartitions);
 	}
 	else
 		hash_agg_update_metrics(aggstate, true, 0);
 
-	aggstate->hash_spill_mode = false;
+	aggstate->spill_mode = false;
 
 	/* prepare to walk the first hash table */
-	select_current_set(aggstate, batch->setno, true);
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_HASH);
 	ResetTupleHashIterator(aggstate->perhash[batch->setno].hashtable,
 						   &aggstate->perhash[batch->setno].hashiter);
 
@@ -2975,14 +3138,14 @@ agg_retrieve_hash_table_in_memory(AggState *aggstate)
 }
 
 /*
- * hashagg_spill_init
+ * agg_spill_init
  *
  * Called after we determined that spilling is necessary. Chooses the number
  * of partitions to create, and initializes them.
  */
 static void
-hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
-				   double input_groups, double hashentrysize)
+agg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
+			   double input_groups, double hashentrysize)
 {
 	int			npartitions;
 	int			partition_bits;
@@ -3018,14 +3181,13 @@ hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
 }
 
 /*
- * hashagg_spill_tuple
+ * agg_spill_tuple
  *
- * No room for new groups in the hash table. Save for later in the appropriate
- * partition.
+ * No room for new groups in memory. Save for later in the appropriate partition.
  */
 static Size
-hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
-					TupleTableSlot *inputslot, uint32 hash)
+agg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
+				TupleTableSlot *inputslot, uint32 hash)
 {
 	TupleTableSlot *spillslot;
 	int			partition;
@@ -3039,7 +3201,7 @@ hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
 	/* spill only attributes that we actually need */
 	if (!aggstate->all_cols_needed)
 	{
-		spillslot = aggstate->hash_spill_wslot;
+		spillslot = aggstate->spill_wslot;
 		slot_getsomeattrs(inputslot, aggstate->max_colno_needed);
 		ExecClearTuple(spillslot);
 		for (int i = 0; i < spillslot->tts_tupleDescriptor->natts; i++)
@@ -3167,14 +3329,14 @@ hashagg_finish_initial_spills(AggState *aggstate)
 	int			setno;
 	int			total_npartitions = 0;
 
-	if (aggstate->hash_spills != NULL)
+	if (aggstate->spills != NULL)
 	{
 		for (setno = 0; setno < aggstate->num_hashes; setno++)
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 
 			total_npartitions += spill->npartitions;
-			hashagg_spill_finish(aggstate, spill, setno);
+			agg_spill_finish(aggstate, spill, setno);
 		}
 
 		/*
@@ -3182,21 +3344,21 @@ hashagg_finish_initial_spills(AggState *aggstate)
 		 * processing batches of spilled tuples. The initial spill structures
 		 * are no longer needed.
 		 */
-		pfree(aggstate->hash_spills);
-		aggstate->hash_spills = NULL;
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
 	}
 
 	hash_agg_update_metrics(aggstate, false, total_npartitions);
-	aggstate->hash_spill_mode = false;
+	aggstate->spill_mode = false;
 }
 
 /*
- * hashagg_spill_finish
+ * agg_spill_finish
  *
  * Transform spill partitions into new batches.
  */
 static void
-hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
+agg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
 {
 	int			i;
 	int			used_bits = 32 - spill->shift;
@@ -3223,8 +3385,8 @@ hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
 		new_batch = hashagg_batch_new(tape, setno,
 									  spill->ntuples[i], cardinality,
 									  used_bits);
-		aggstate->hash_batches = lappend(aggstate->hash_batches, new_batch);
-		aggstate->hash_batches_used++;
+		aggstate->spill_batches = lappend(aggstate->spill_batches, new_batch);
+		aggstate->spill_batches_used++;
 	}
 
 	pfree(spill->ntuples);
@@ -3239,33 +3401,668 @@ static void
 hashagg_reset_spill_state(AggState *aggstate)
 {
 	/* free spills from initial pass */
-	if (aggstate->hash_spills != NULL)
+	if (aggstate->spills != NULL)
 	{
 		int			setno;
 
 		for (setno = 0; setno < aggstate->num_hashes; setno++)
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 
 			pfree(spill->ntuples);
 			pfree(spill->partitions);
 		}
-		pfree(aggstate->hash_spills);
-		aggstate->hash_spills = NULL;
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
+	}
+
+	/* free batches */
+	list_free_deep(aggstate->spill_batches);
+	aggstate->spill_batches = NIL;
+
+	/* close tape set */
+	if (aggstate->spill_tapeset != NULL)
+	{
+		LogicalTapeSetClose(aggstate->spill_tapeset);
+		aggstate->spill_tapeset = NULL;
+	}
+}
+static void
+agg_fill_index(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	ExprContext *tmpcontext = aggstate->tmpcontext;
+	
+	/*
+	 * Process each outer-plan tuple, and then fetch the next one, until we
+	 * exhaust the outer plan.
+	 */
+	for (;;)
+	{
+		TupleTableSlot *outerslot;
+
+		outerslot = fetch_input_tuple(aggstate);
+		if (TupIsNull(outerslot))
+			break;
+
+		/* set up for lookup_index_entries and advance_aggregates */
+		tmpcontext->ecxt_outertuple = outerslot;
+
+		/* insert input tuple to index possibly spilling index to disk */
+		lookup_index_entries(aggstate);
+
+		/* Advance the aggregates (or combine functions) */
+		advance_aggregates(aggstate);
+
+		/*
+		 * Reset per-input-tuple context after each tuple, but note that the
+		 * hash lookups do this too
+		 */
+		ResetExprContext(aggstate->tmpcontext);
+	}
+
+	/* 
+	 * Mark that index filled here, so during after recompilation
+	 * expr will expect MinimalTuple instead of outer plan's one type.
+	 */
+	aggstate->index_filled = true;
+
+	indexagg_finish_initial_spills(aggstate);
+
+	/* 
+	 * This is useful only when there is no spill occurred and projecting
+	 * occurs in memory, but still initialize it.
+	 */
+	select_current_set(aggstate, 0, GROUPING_STRATEGY_INDEX);
+	InitTupleIndexIterator(perindex->index, &perindex->iter);
+}
+
+/* 
+ * Extract the attributes that make up the grouping key into the
+ * indexslot. This is necessary to perform comparison in index.
+ */
+static void
+prepare_index_slot(AggStatePerIndex perindex,
+				   TupleTableSlot *inputslot,
+				   TupleTableSlot *indexslot)
+{
+	slot_getsomeattrs(inputslot, perindex->largestGrpColIdx);
+	ExecClearTuple(indexslot);
+	
+	for (int i = 0; i < perindex->numCols; ++i)
+	{
+		int		varNumber = perindex->idxKeyColIdxInput[i] - 1;
+		indexslot->tts_values[i] = inputslot->tts_values[varNumber];
+		indexslot->tts_isnull[i] = inputslot->tts_isnull[varNumber];
+	}
+	ExecStoreVirtualTuple(indexslot);
+}
+
+static void
+indexagg_reset_spill_state(AggState *aggstate)
+{
+	/* free spills from initial pass */
+	if (aggstate->spills != NULL)
+	{
+		HashAggSpill *spill = &aggstate->spills[0];
+		pfree(spill->ntuples);
+		pfree(spill->partitions);
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
 	}
 
 	/* free batches */
-	list_free_deep(aggstate->hash_batches);
-	aggstate->hash_batches = NIL;
+	list_free_deep(aggstate->spill_batches);
+	aggstate->spill_batches = NIL;
 
 	/* close tape set */
-	if (aggstate->hash_tapeset != NULL)
+	if (aggstate->spill_tapeset != NULL)
+	{
+		LogicalTapeSetClose(aggstate->spill_tapeset);
+		aggstate->spill_tapeset = NULL;
+	}
+}
+
+/* 
+ * Initialize a freshly-created MinimalTuple in index
+ */
+static void
+initialize_index_entry(AggState *aggstate, TupleIndex index, TupleIndexEntry entry)
+{
+	AggStatePerGroup pergroup;
+
+	aggstate->spill_ngroups_current++;
+	index_agg_check_limits(aggstate);
+
+	/* no need to allocate or initialize per-group state */
+	if (aggstate->numtrans == 0)
+		return;		
+
+	pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(index, entry);
+	
+	/* 
+	 * Initialize aggregates for new tuple group, indexagg_lookup_entries()
+	 * already has selected the relevant grouping set.
+	 */
+	for (int transno = 0; transno < aggstate->numtrans; ++transno)
+	{
+		AggStatePerTrans pertrans = &aggstate->pertrans[transno];
+		AggStatePerGroup pergroupstate = &pergroup[transno];
+		
+		initialize_aggregate(aggstate, pertrans, pergroupstate);
+	}
+}
+
+/* 
+ * Create new sorted run from current in-memory stored index.
+ */
+static void
+indexagg_save_index_run(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	ExprContext *econtext;
+	TupleIndexIteratorData iter;
+	AggStatePerAgg peragg;
+	TupleTableSlot *firstSlot;
+	TupleIndexEntry entry;
+	TupleTableSlot *indexslot;
+	AggStatePerGroup pergroup;
+	
+	econtext = aggstate->ss.ps.ps_ExprContext;
+	firstSlot = aggstate->ss.ss_ScanTupleSlot;
+	peragg = aggstate->peragg;
+	indexslot = perindex->indexslot;
+
+	InitTupleIndexIterator(perindex->index, &iter);
+	
+	tuplemerge_start_run(aggstate->mergestate);
+
+	while ((entry = TupleIndexIteratorNext(&iter)) != NULL)
 	{
-		LogicalTapeSetClose(aggstate->hash_tapeset);
-		aggstate->hash_tapeset = NULL;
+		MinimalTuple tuple = TupleIndexEntryGetMinimalTuple(entry);
+		TupleTableSlot *output;
+
+		ResetExprContext(econtext);
+		ExecStoreMinimalTuple(tuple, indexslot, false);
+		slot_getallattrs(indexslot);
+		
+		ExecClearTuple(firstSlot);
+		memset(firstSlot->tts_isnull, true,
+			   firstSlot->tts_tupleDescriptor->natts * sizeof(bool));
+
+		for (int i = 0; i < perindex->numCols; i++)
+		{
+			int varNumber = perindex->idxKeyColIdxInput[i] - 1;
+
+			firstSlot->tts_values[varNumber] = indexslot->tts_values[i];
+			firstSlot->tts_isnull[varNumber] = indexslot->tts_isnull[i];
+		}
+		ExecStoreVirtualTuple(firstSlot);
+
+		pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(perindex->index, entry);
+
+		econtext->ecxt_outertuple = firstSlot;
+		prepare_projection_slot(aggstate,
+								econtext->ecxt_outertuple,
+								aggstate->current_set);
+		finalize_aggregates(aggstate, peragg, pergroup);
+		output = project_aggregates(aggstate);
+		if (output)
+			tuplemerge_puttupleslot(aggstate->mergestate, output);
 	}
+
+	tuplemerge_end_run(aggstate->mergestate);
 }
 
+/* 
+ * Fill in index with tuples in given batch.
+ */
+static void
+indexagg_refill_batch(AggState *aggstate, HashAggBatch *batch)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	TupleTableSlot *spillslot = aggstate->spill_rslot;
+	TupleTableSlot *indexslot = perindex->indexslot;
+	TupleIndex index = perindex->index;
+	LogicalTapeSet *tapeset = aggstate->spill_tapeset;
+	HashAggSpill spill;
+	bool	spill_initialized = false;
+
+	agg_set_limits(aggstate->hashentrysize, batch->input_card, batch->used_bits,
+				   &aggstate->spill_mem_limit, &aggstate->spill_ngroups_limit, NULL);
+
+	ReScanExprContext(aggstate->indexcontext);
+
+	MemoryContextReset(aggstate->index_entrycxt);
+	MemoryContextReset(aggstate->index_nodecxt);
+	ResetTupleIndex(perindex->index);
+
+	aggstate->spill_ngroups_current = 0;
+
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_INDEX);
+
+	agg_recompile_expressions(aggstate, true, true);
+
+	for (;;)
+	{
+		MinimalTuple tuple;
+		TupleIndexEntry entry;
+		bool		isnew = false;
+		bool	   *p_isnew;
+		uint32		hash;
+
+		CHECK_FOR_INTERRUPTS();
+		
+		tuple = hashagg_batch_read(batch, &hash);
+		if (tuple == NULL)
+			break;
+
+		ExecStoreMinimalTuple(tuple, spillslot, true);
+		aggstate->tmpcontext->ecxt_outertuple = spillslot;
+
+		prepare_index_slot(perindex, spillslot, indexslot);
+
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
+		entry = TupleIndexLookup(index, indexslot, p_isnew);
+
+		if (entry != NULL)
+		{
+			if (isnew)
+				initialize_index_entry(aggstate, index, entry);
+
+			aggstate->all_pergroups[batch->setno] = TupleIndexEntryGetAdditional(index, entry);
+			advance_aggregates(aggstate);
+		}
+		else
+		{
+			if (!spill_initialized)
+			{
+				spill_initialized = true;
+				agg_spill_init(&spill, tapeset, batch->used_bits,
+							   batch->input_card, aggstate->hashentrysize);
+			}
+
+			agg_spill_tuple(aggstate, &spill, spillslot, hash);
+			aggstate->all_pergroups[batch->setno] = NULL;
+		}
+		
+		ResetExprContext(aggstate->tmpcontext);
+	}
+
+	LogicalTapeClose(batch->input_tape);
+
+	if (spill_initialized)
+	{
+		agg_spill_finish(aggstate, &spill, 0);
+		index_agg_update_metrics(aggstate, true, spill.npartitions);
+	}
+	else
+		index_agg_update_metrics(aggstate, true, 0);
+
+	aggstate->spill_mode = false;
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_INDEX);
+
+	pfree(batch);
+}
+
+static void
+indexagg_finish_initial_spills(AggState *aggstate)
+{
+	HashAggSpill *spill;
+	AggStatePerIndex perindex;
+	Sort		 *sort;
+
+	if (!aggstate->spill_ever_happened)
+		return;
+
+	Assert(aggstate->spills != NULL);
+
+	spill = aggstate->spills;
+	agg_spill_finish(aggstate, aggstate->spills, 0);
+
+	index_agg_update_metrics(aggstate, false, spill->npartitions);
+	aggstate->spill_mode = false;
+
+	pfree(aggstate->spills);
+	aggstate->spills = NULL;
+
+	perindex = aggstate->perindex;
+	sort = aggstate->index_sort;
+	aggstate->mergestate = tuplemerge_begin_heap(aggstate->ss.ps.ps_ResultTupleDesc,
+												 perindex->numKeyCols,
+												 perindex->idxKeyColIdxTL,
+												 sort->sortOperators,
+												 sort->collations,
+												 sort->nullsFirst,
+												 work_mem, NULL);
+	/* 
+	 * Some data was spilled.  Index aggregate requires output to be sorted,
+	 * so now we must process all remaining spilled data and produce sorted
+	 * runs for external merge.  The first saved run is current opened index.
+	 */
+	indexagg_save_index_run(aggstate);
+
+	while (aggstate->spill_batches != NIL)
+	{
+		HashAggBatch *batch = llast(aggstate->spill_batches);
+		aggstate->spill_batches = list_delete_last(aggstate->spill_batches);
+
+		indexagg_refill_batch(aggstate, batch);
+		indexagg_save_index_run(aggstate);
+	}
+
+	tuplemerge_performmerge(aggstate->mergestate);
+}
+
+static uint32
+index_calculate_input_slot_hash(AggState *aggstate,
+								TupleTableSlot *inputslot)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	MemoryContext oldcxt;
+	uint32 hash;
+	bool isnull;
+	
+	oldcxt = MemoryContextSwitchTo(aggstate->tmpcontext->ecxt_per_tuple_memory);
+	
+	perindex->exprcontext->ecxt_innertuple = inputslot;
+	hash = DatumGetUInt32(ExecEvalExpr(perindex->indexhashexpr,
+									   perindex->exprcontext,
+									   &isnull));
+
+	MemoryContextSwitchTo(oldcxt);
+
+	return hash;
+}
+
+/* 
+ * indexagg_lookup_entries
+ * 
+ * Insert input tuples to in-memory index.
+ */
+static void
+lookup_index_entries(AggState *aggstate)
+{
+	int numGroupingSets = Max(aggstate->maxsets, 1);
+	AggStatePerGroup *pergroup = aggstate->all_pergroups;
+	TupleTableSlot *outerslot = aggstate->tmpcontext->ecxt_outertuple;
+
+	for (int setno = 0; setno < numGroupingSets; ++setno)
+	{
+		AggStatePerIndex	perindex = &aggstate->perindex[setno];
+		TupleIndex		index = perindex->index;
+		TupleTableSlot *indexslot = perindex->indexslot;
+		TupleIndexEntry	entry;
+		bool			isnew = false;
+		bool		   *p_isnew;
+
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_INDEX);
+
+		prepare_index_slot(perindex, outerslot, indexslot);
+
+		/* Lookup entry in btree */
+		entry = TupleIndexLookup(perindex->index, indexslot, p_isnew);
+
+		/* For now everything is stored in memory - no disk spills */
+		if (entry != NULL)
+		{
+			/* Initialize it's trans state if just created */
+			if (isnew)
+				initialize_index_entry(aggstate, index, entry);
+
+			pergroup[setno] = TupleIndexEntryGetAdditional(index, entry);
+		}
+		else
+		{
+			HashAggSpill *spill = &aggstate->spills[setno];
+			uint32 hash;
+			
+			if (spill->partitions == NULL)
+			{
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
+							   perindex->aggnode->numGroups,
+							   aggstate->hashentrysize);
+			}
+
+			hash = index_calculate_input_slot_hash(aggstate, indexslot);
+			agg_spill_tuple(aggstate, spill, outerslot, hash);
+			pergroup[setno] = NULL;
+		}
+	}
+}
+
+static TupleTableSlot *
+agg_retrieve_index_in_memory(AggState *aggstate)
+{
+	ExprContext *econtext;
+	TupleTableSlot *firstSlot;
+	AggStatePerIndex perindex;
+	AggStatePerAgg peragg;
+	AggStatePerGroup pergroup;
+	TupleTableSlot *result;
+	
+	econtext = aggstate->ss.ps.ps_ExprContext;
+	firstSlot = aggstate->ss.ss_ScanTupleSlot;
+	peragg = aggstate->peragg;
+	perindex = &aggstate->perindex[aggstate->current_set];
+
+	for (;;)
+	{
+		TupleIndexEntry entry;
+		TupleTableSlot *indexslot = perindex->indexslot;
+
+		CHECK_FOR_INTERRUPTS();
+		
+		entry = TupleIndexIteratorNext(&perindex->iter);
+		if (entry == NULL)
+			return NULL;
+
+		ResetExprContext(econtext);
+		ExecStoreMinimalTuple(TupleIndexEntryGetMinimalTuple(entry), indexslot, false);
+		slot_getallattrs(indexslot);
+		
+		ExecClearTuple(firstSlot);
+		memset(firstSlot->tts_isnull, true,
+			   firstSlot->tts_tupleDescriptor->natts * sizeof(bool));
+
+		for (int i = 0; i < perindex->numCols; i++)
+		{
+			int varNumber = perindex->idxKeyColIdxInput[i] - 1;
+
+			firstSlot->tts_values[varNumber] = indexslot->tts_values[i];
+			firstSlot->tts_isnull[varNumber] = indexslot->tts_isnull[i];
+		}
+		ExecStoreVirtualTuple(firstSlot);
+		
+		pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(perindex->index, entry);
+		
+		econtext->ecxt_outertuple = firstSlot;
+		prepare_projection_slot(aggstate,
+								econtext->ecxt_outertuple,
+								aggstate->current_set);
+		finalize_aggregates(aggstate, peragg, pergroup);
+		result = project_aggregates(aggstate);
+		if (result)
+			return result;
+	}
+	
+	/* no more groups */
+	return NULL;
+}
+
+static TupleTableSlot *
+agg_retrieve_index_merge(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	TupleTableSlot *slot = perindex->mergeslot;
+	TupleTableSlot *resultslot = aggstate->ss.ps.ps_ResultTupleSlot;
+	
+	ExecClearTuple(slot);
+	
+	if (!tuplesort_gettupleslot(aggstate->mergestate, true, true, slot, NULL))
+		return NULL;
+
+	slot_getallattrs(slot);
+	ExecClearTuple(resultslot);
+	
+	for (int i = 0; i < resultslot->tts_tupleDescriptor->natts; ++i)
+	{
+		resultslot->tts_values[i] = slot->tts_values[i];
+		resultslot->tts_isnull[i] = slot->tts_isnull[i];
+	}
+	ExecStoreVirtualTuple(resultslot);
+
+	return resultslot;
+}
+
+static TupleTableSlot *
+agg_retrieve_index(AggState *aggstate)
+{
+	if (aggstate->spill_ever_happened)
+		return agg_retrieve_index_merge(aggstate);
+	else
+		return agg_retrieve_index_in_memory(aggstate);
+}
+
+static void
+build_index(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	MemoryContext metacxt = aggstate->index_metacxt;
+	MemoryContext entrycxt = aggstate->index_entrycxt;
+	MemoryContext nodecxt = aggstate->index_nodecxt;
+	MemoryContext oldcxt;
+	Size	additionalsize;
+	Oid	   *eqfuncoids;
+	Sort   *sort;
+
+	Assert(aggstate->aggstrategy == AGG_INDEX);
+
+	additionalsize = aggstate->numtrans * sizeof(AggStatePerGroupData);
+	sort = aggstate->index_sort;
+
+	/* inmem index */
+	perindex->index = BuildTupleIndex(perindex->indexslot->tts_tupleDescriptor,
+									  perindex->numKeyCols,
+									  perindex->idxKeyColIdxIndex,
+									  sort->sortOperators,
+									  sort->collations,
+									  sort->nullsFirst,
+									  additionalsize,
+									  metacxt,
+									  entrycxt,
+									  nodecxt);
+
+	/* disk spill logic */
+	oldcxt = MemoryContextSwitchTo(metacxt);
+	execTuplesHashPrepare(perindex->numKeyCols, perindex->aggnode->grpOperators,
+						  &eqfuncoids, &perindex->hashfunctions);
+	perindex->indexhashexpr =
+		ExecBuildHash32FromAttrs(perindex->indexslot->tts_tupleDescriptor,
+								 perindex->indexslot->tts_ops,
+								 perindex->hashfunctions,
+								 perindex->aggnode->grpCollations,
+								 perindex->numKeyCols,
+								 perindex->idxKeyColIdxIndex,
+								 &aggstate->ss.ps,
+								 0);
+	perindex->exprcontext = CreateStandaloneExprContext();
+	MemoryContextSwitchTo(oldcxt);
+}
+
+static void
+find_index_columns(AggState *aggstate)
+{
+	Bitmapset  *base_colnos;
+	Bitmapset  *aggregated_colnos;
+	TupleDesc	scanDesc = aggstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	List	   *outerTlist = outerPlanState(aggstate)->plan->targetlist;
+	EState	   *estate = aggstate->ss.ps.state;
+	AggStatePerIndex perindex;
+	Bitmapset  *colnos;
+	AttrNumber *sortColIdx;
+	List	   *indexTlist = NIL;
+	TupleDesc   indexDesc;
+	int			maxCols;
+	int			i;
+
+	find_cols(aggstate, &aggregated_colnos, &base_colnos);
+	aggstate->colnos_needed = bms_union(base_colnos, aggregated_colnos);
+	aggstate->max_colno_needed = 0;
+	aggstate->all_cols_needed = true;
+
+	for (i = 0; i < scanDesc->natts; i++)
+	{
+		int		colno = i + 1;
+
+		if (bms_is_member(colno, aggstate->colnos_needed))
+			aggstate->max_colno_needed = colno;
+		else
+			aggstate->all_cols_needed = false;
+	}
+
+	perindex = aggstate->perindex;
+	colnos = bms_copy(base_colnos);
+
+	if (aggstate->phases[0].grouped_cols)
+	{
+		Bitmapset *grouped_cols = aggstate->phases[0].grouped_cols[0];
+		ListCell  *lc;
+		foreach(lc, aggstate->all_grouped_cols)
+		{
+			int attnum = lfirst_int(lc);
+			if (!bms_is_member(attnum, grouped_cols))
+				colnos = bms_del_member(colnos, attnum);
+		}
+	}
+
+	maxCols = bms_num_members(colnos) + perindex->numKeyCols;
+
+	perindex->idxKeyColIdxInput = palloc(maxCols * sizeof(AttrNumber));
+	perindex->idxKeyColIdxIndex = palloc(perindex->numKeyCols * sizeof(AttrNumber));
+
+	/* Add all the sorting/grouping columns to colnos */
+	sortColIdx = aggstate->index_sort->sortColIdx;
+	for (i = 0; i < perindex->numKeyCols; i++)
+		colnos = bms_add_member(colnos, sortColIdx[i]);
+	
+	for (i = 0; i < perindex->numKeyCols; i++)
+	{
+		perindex->idxKeyColIdxInput[i] = sortColIdx[i];
+		perindex->idxKeyColIdxIndex[i] = i + 1;
+
+		perindex->numCols++;
+		/* delete already mapped columns */
+		colnos = bms_del_member(colnos, sortColIdx[i]);
+	}
+	
+	/* and the remainig columns */
+	i = -1;
+	while ((i = bms_next_member(colnos, i)) >= 0)
+	{
+		perindex->idxKeyColIdxInput[perindex->numCols] = i;
+		perindex->numCols++;
+	}
+
+	/* build tuple descriptor for the index */
+	perindex->largestGrpColIdx = 0;
+	for (i = 0; i < perindex->numCols; i++)
+	{
+		int		varNumber = perindex->idxKeyColIdxInput[i] - 1;
+		
+		indexTlist = lappend(indexTlist, list_nth(outerTlist, varNumber));
+		perindex->largestGrpColIdx = Max(varNumber + 1, perindex->largestGrpColIdx);
+	}
+
+	indexDesc = ExecTypeFromTL(indexTlist);
+	perindex->indexslot = ExecAllocTableSlot(&estate->es_tupleTable, indexDesc,
+										   &TTSOpsMinimalTuple);
+	list_free(indexTlist);
+	bms_free(colnos);
+
+	bms_free(base_colnos);
+}
 
 /* -----------------
  * ExecInitAgg
@@ -3297,10 +4094,12 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	int			numGroupingSets = 1;
 	int			numPhases;
 	int			numHashes;
+	int			numIndexes;
 	int			i = 0;
 	int			j = 0;
 	bool		use_hashing = (node->aggstrategy == AGG_HASHED ||
 							   node->aggstrategy == AGG_MIXED);
+	bool		use_index = (node->aggstrategy == AGG_INDEX);
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -3337,6 +4136,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	 */
 	numPhases = (use_hashing ? 1 : 2);
 	numHashes = (use_hashing ? 1 : 0);
+	numIndexes = (use_index ? 1 : 0);
 
 	/*
 	 * Calculate the maximum number of grouping sets in any phase; this
@@ -3356,7 +4156,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 
 			/*
 			 * additional AGG_HASHED aggs become part of phase 0, but all
-			 * others add an extra phase.
+			 * others add an extra phase.  AGG_INDEX does not support grouping
+			 * sets, so else branch must be AGG_SORTED or AGG_MIXED.
 			 */
 			if (agg->aggstrategy != AGG_HASHED)
 				++numPhases;
@@ -3395,6 +4196,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 
 	if (use_hashing)
 		hash_create_memory(aggstate);
+	else if (use_index)
+		index_create_memory(aggstate);
 
 	ExecAssignExprContext(estate, &aggstate->ss.ps);
 
@@ -3501,6 +4304,13 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		aggstate->phases[0].gset_lengths = palloc_array(int, numHashes);
 		aggstate->phases[0].grouped_cols = palloc_array(Bitmapset *, numHashes);
 	}
+	else if (numIndexes)
+	{
+		aggstate->perindex = palloc0(sizeof(AggStatePerIndexData) * numIndexes);
+		aggstate->phases[0].numsets = 0;
+		aggstate->phases[0].gset_lengths = palloc(numIndexes * sizeof(int));
+		aggstate->phases[0].grouped_cols = palloc(numIndexes * sizeof(Bitmapset *));
+	}
 
 	phase = 0;
 	for (phaseidx = 0; phaseidx <= list_length(node->chain); ++phaseidx)
@@ -3513,6 +4323,18 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			aggnode = list_nth_node(Agg, node->chain, phaseidx - 1);
 			sortnode = castNode(Sort, outerPlan(aggnode));
 		}
+		else if (use_index)
+		{
+			Assert(list_length(node->chain) == 1);
+
+			aggnode = node;
+			sortnode = castNode(Sort, linitial(node->chain));
+			/* 
+			 * list contains single element, so we must adjust loop variable,
+			 * so it will be single iteration at all.
+			 */
+			phaseidx++;
+		}
 		else
 		{
 			aggnode = node;
@@ -3549,6 +4371,35 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			all_grouped_cols = bms_add_members(all_grouped_cols, cols);
 			continue;
 		}
+		else if (aggnode->aggstrategy == AGG_INDEX)
+		{
+			AggStatePerPhase phasedata = &aggstate->phases[0];
+			AggStatePerIndex perindex;
+			Bitmapset *cols;
+			
+			Assert(phase == 0);
+			Assert(sortnode);
+
+			i = phasedata->numsets++;
+			
+			/* phase 0 always points to the "real" Agg in the index case */
+			phasedata->aggnode = node;
+			phasedata->aggstrategy = node->aggstrategy;
+			phasedata->sortnode = sortnode;
+
+			perindex = &aggstate->perindex[i];
+			perindex->aggnode = aggnode;
+			aggstate->index_sort = sortnode;
+
+			phasedata->gset_lengths[i] = perindex->numKeyCols = aggnode->numCols;
+
+			cols = NULL;
+			for (j = 0; j < aggnode->numCols; ++j)
+				cols = bms_add_member(cols, aggnode->grpColIdx[j]);
+				
+			phasedata->grouped_cols[i] = cols;
+			all_grouped_cols = bms_add_members(all_grouped_cols, cols);
+		}
 		else
 		{
 			AggStatePerPhase phasedata = &aggstate->phases[++phase];
@@ -3666,7 +4517,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	aggstate->all_pergroups = palloc0_array(AggStatePerGroup, numGroupingSets + numHashes);
 	pergroups = aggstate->all_pergroups;
 
-	if (node->aggstrategy != AGG_HASHED)
+	if (node->aggstrategy != AGG_HASHED && node->aggstrategy != AGG_INDEX)
 	{
 		for (i = 0; i < numGroupingSets; i++)
 		{
@@ -3680,18 +4531,15 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	/*
 	 * Hashing can only appear in the initial phase.
 	 */
-	if (use_hashing)
+	if (use_hashing || use_index)
 	{
 		Plan	   *outerplan = outerPlan(node);
 		double		totalGroups = 0;
 
-		aggstate->hash_spill_rslot = ExecInitExtraTupleSlot(estate, scanDesc,
-															&TTSOpsMinimalTuple);
-		aggstate->hash_spill_wslot = ExecInitExtraTupleSlot(estate, scanDesc,
-															&TTSOpsVirtual);
-
-		/* this is an array of pointers, not structures */
-		aggstate->hash_pergroup = pergroups;
+		aggstate->spill_rslot = ExecInitExtraTupleSlot(estate, scanDesc,
+													   &TTSOpsMinimalTuple);
+		aggstate->spill_wslot = ExecInitExtraTupleSlot(estate, scanDesc,
+													   &TTSOpsVirtual);
 
 		aggstate->hashentrysize = hash_agg_entry_size(aggstate->numtrans,
 													  outerplan->plan_width,
@@ -3706,20 +4554,115 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		for (int k = 0; k < aggstate->num_hashes; k++)
 			totalGroups += aggstate->perhash[k].aggnode->numGroups;
 
-		hash_agg_set_limits(aggstate->hashentrysize, totalGroups, 0,
-							&aggstate->hash_mem_limit,
-							&aggstate->hash_ngroups_limit,
-							&aggstate->hash_planned_partitions);
-		find_hash_columns(aggstate);
+		agg_set_limits(aggstate->hashentrysize, totalGroups, 0,
+					   &aggstate->spill_mem_limit,
+					   &aggstate->spill_ngroups_limit,
+					   &aggstate->spill_planned_partitions);
+
+		if (use_hashing)
+		{
+			/* this is an array of pointers, not structures */
+			aggstate->hash_pergroup = pergroups;
+	
+			find_hash_columns(aggstate);
+
+			/* Skip massive memory allocation if we are just doing EXPLAIN */
+			if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+				build_hash_tables(aggstate);
+			aggstate->table_filled = false;
+		}
+		else
+		{
+			find_index_columns(aggstate);
 
-		/* Skip massive memory allocation if we are just doing EXPLAIN */
-		if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-			build_hash_tables(aggstate);
+			if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+				build_index(aggstate);
+			aggstate->index_filled = false;
+		}
 
-		aggstate->table_filled = false;
 
 		/* Initialize this to 1, meaning nothing spilled, yet */
-		aggstate->hash_batches_used = 1;
+		aggstate->spill_batches_used = 1;
+	}
+
+	/* 
+	 * For index merge disk spill may be required and we perform external
+	 * merge for this purpose. But stored tuples are already projected, so
+	 * have different TupleDesc than used in-memory (inputDesc and indexDesc).
+	 */
+	if (use_index)
+	{
+		AggStatePerIndex perindex = aggstate->perindex;
+		ListCell *lc;
+		List *targetlist = aggstate->ss.ps.plan->targetlist;
+		AttrNumber *attr_mapping_tl = 
+						palloc0(sizeof(AttrNumber) * list_length(targetlist));
+		AttrNumber *keyColIdxResult;
+
+		/* 
+		 * Build grouping column attribute mapping and store it in
+		 * attr_mapping_tl.  If there is no such mapping (projected), then
+		 * InvalidAttrNumber is set, otherwise index in indexDesc column
+		 * storing this attribute.
+		 */
+		foreach (lc, targetlist)
+		{
+			TargetEntry *te = (TargetEntry *)lfirst(lc);
+			Var *group_var;
+
+			/* All grouping expressions in targetlist stored as OUTER Vars */
+			if (!IsA(te->expr, Var))
+				continue;
+			
+			group_var = (Var *)te->expr;
+			if (group_var->varno != OUTER_VAR)
+				continue;
+
+			attr_mapping_tl[foreach_current_index(lc)] = group_var->varattno;
+		}
+
+		/* Mapping is built and now create reverse mapping */
+		keyColIdxResult = palloc0(sizeof(AttrNumber) * list_length(outerPlan(node)->targetlist));
+		for (i = 0; i < list_length(targetlist); ++i)
+		{
+			AttrNumber outer_attno = attr_mapping_tl[i];
+			AttrNumber existingIdx;
+
+			if (!AttributeNumberIsValid(outer_attno))
+				continue;
+
+			existingIdx = keyColIdxResult[outer_attno - 1];
+			
+			/* attnumbers can duplicate, so use first ones */
+			if (AttributeNumberIsValid(existingIdx) && existingIdx <= outer_attno)
+				continue;
+
+			/* 
+			 * column can be referenced in query but planner can decide to
+			 * remove is from grouping.
+			 */
+			if (!bms_is_member(outer_attno, all_grouped_cols))
+				continue;
+
+			keyColIdxResult[outer_attno - 1] = i + 1;
+		}
+
+		perindex->idxKeyColIdxTL = palloc(sizeof(AttrNumber) * perindex->numKeyCols);
+		for (i = 0; i < perindex->numKeyCols; ++i)
+		{
+			AttrNumber attno = keyColIdxResult[perindex->idxKeyColIdxInput[i] - 1];
+			if (!AttributeNumberIsValid(attno))
+				elog(ERROR, "could not locate group by attributes in targetlist for index mapping");
+
+			perindex->idxKeyColIdxTL[i] = attno;
+		}
+
+		pfree(attr_mapping_tl);
+		pfree(keyColIdxResult);
+
+		perindex->mergeslot = ExecInitExtraTupleSlot(estate,
+													 aggstate->ss.ps.ps_ResultTupleDesc, 
+													 &TTSOpsMinimalTuple);
 	}
 
 	/*
@@ -3732,13 +4675,19 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	{
 		aggstate->current_phase = 0;
 		initialize_phase(aggstate, 0);
-		select_current_set(aggstate, 0, true);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_HASH);
+	}
+	else if (node->aggstrategy == AGG_INDEX)
+	{
+		aggstate->current_phase = 0;
+		initialize_phase(aggstate, 0);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_INDEX);
 	}
 	else
 	{
 		aggstate->current_phase = 1;
 		initialize_phase(aggstate, 1);
-		select_current_set(aggstate, 0, false);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_SORT);
 	}
 
 	/*
@@ -4066,8 +5015,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	for (phaseidx = 0; phaseidx < aggstate->numphases; phaseidx++)
 	{
 		AggStatePerPhase phase = &aggstate->phases[phaseidx];
-		bool		dohash = false;
-		bool		dosort = false;
+		int			strategy;
 
 		/* phase 0 doesn't necessarily exist */
 		if (!phase->aggnode)
@@ -4079,8 +5027,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			 * Phase one, and only phase one, in a mixed agg performs both
 			 * sorting and aggregation.
 			 */
-			dohash = true;
-			dosort = true;
+			strategy = GROUPING_STRATEGY_HASH | GROUPING_STRATEGY_SORT;
 		}
 		else if (aggstate->aggstrategy == AGG_MIXED && phaseidx == 0)
 		{
@@ -4094,19 +5041,24 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		else if (phase->aggstrategy == AGG_PLAIN ||
 				 phase->aggstrategy == AGG_SORTED)
 		{
-			dohash = false;
-			dosort = true;
+			strategy = GROUPING_STRATEGY_SORT;
 		}
 		else if (phase->aggstrategy == AGG_HASHED)
 		{
-			dohash = true;
-			dosort = false;
+			strategy = GROUPING_STRATEGY_HASH;
+		}
+		else if (phase->aggstrategy == AGG_INDEX)
+		{
+			strategy = GROUPING_STRATEGY_INDEX;
 		}
 		else
+		{
 			Assert(false);
+			/* keep compiler quiet */
+			strategy = 0;
+		}
 
-		phase->evaltrans = ExecBuildAggTrans(aggstate, phase, dosort, dohash,
-											 false);
+		phase->evaltrans = ExecBuildAggTrans(aggstate, phase, strategy, false);
 
 		/* cache compiled expression for outer slot without NULL check */
 		phase->evaltrans_cache[0][0] = phase->evaltrans;
@@ -4409,9 +5361,9 @@ ExecEndAgg(AggState *node)
 
 		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
 		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
-		si->hash_batches_used = node->hash_batches_used;
-		si->hash_disk_used = node->hash_disk_used;
-		si->hash_mem_peak = node->hash_mem_peak;
+		si->hash_batches_used = node->spill_batches_used;
+		si->hash_disk_used = node->spill_disk_used;
+		si->hash_mem_peak = node->spill_mem_peak;
 	}
 
 	/* Make sure we have closed any open tuplesorts */
@@ -4421,7 +5373,10 @@ ExecEndAgg(AggState *node)
 	if (node->sort_out)
 		tuplesort_end(node->sort_out);
 
-	hashagg_reset_spill_state(node);
+	if (node->aggstrategy == AGG_INDEX)
+		indexagg_reset_spill_state(node);
+	else
+		hashagg_reset_spill_state(node);
 
 	/* Release hash tables too */
 	if (node->hash_metacxt != NULL)
@@ -4434,6 +5389,26 @@ ExecEndAgg(AggState *node)
 		MemoryContextDelete(node->hash_tuplescxt);
 		node->hash_tuplescxt = NULL;
 	}
+	if (node->index_metacxt != NULL)
+	{
+		MemoryContextDelete(node->index_metacxt);
+		node->index_metacxt = NULL;
+	}
+	if (node->index_entrycxt != NULL)
+	{
+		MemoryContextDelete(node->index_entrycxt);
+		node->index_entrycxt = NULL;
+	}
+	if (node->index_nodecxt != NULL)
+	{
+		MemoryContextDelete(node->index_nodecxt);
+		node->index_nodecxt = NULL;
+	}
+	if (node->mergestate)
+	{
+		tuplesort_end(node->mergestate);
+		node->mergestate = NULL;
+	}
 
 	for (transno = 0; transno < node->numtrans; transno++)
 	{
@@ -4451,6 +5426,8 @@ ExecEndAgg(AggState *node)
 		ReScanExprContext(node->aggcontexts[setno]);
 	if (node->hashcontext)
 		ReScanExprContext(node->hashcontext);
+	if (node->indexcontext)
+		ReScanExprContext(node->indexcontext);
 
 	outerPlan = outerPlanState(node);
 	ExecEndNode(outerPlan);
@@ -4486,12 +5463,27 @@ ExecReScanAgg(AggState *node)
 		 * we can just rescan the existing hash table; no need to build it
 		 * again.
 		 */
-		if (outerPlan->chgParam == NULL && !node->hash_ever_spilled &&
+		if (outerPlan->chgParam == NULL && !node->spill_ever_happened &&
 			!bms_overlap(node->ss.ps.chgParam, aggnode->aggParams))
 		{
 			ResetTupleHashIterator(node->perhash[0].hashtable,
 								   &node->perhash[0].hashiter);
-			select_current_set(node, 0, true);
+			select_current_set(node, 0, GROUPING_STRATEGY_HASH);
+			return;
+		}
+	}
+
+	if (node->aggstrategy == AGG_INDEX)
+	{
+		if (!node->index_filled)
+			return;
+
+		if (outerPlan->chgParam == NULL && !node->spill_ever_happened &&
+			!bms_overlap(node->ss.ps.chgParam, aggnode->aggParams))
+		{
+			AggStatePerIndex perindex = node->perindex;
+			ResetTupleIndexIterator(perindex->index, &perindex->iter);
+			select_current_set(node, 0, GROUPING_STRATEGY_INDEX);
 			return;
 		}
 	}
@@ -4545,9 +5537,9 @@ ExecReScanAgg(AggState *node)
 	{
 		hashagg_reset_spill_state(node);
 
-		node->hash_ever_spilled = false;
-		node->hash_spill_mode = false;
-		node->hash_ngroups_current = 0;
+		node->spill_ever_happened = false;
+		node->spill_mode = false;
+		node->spill_ngroups_current = 0;
 
 		ReScanExprContext(node->hashcontext);
 		/* Rebuild empty hash table(s) */
@@ -4555,10 +5547,33 @@ ExecReScanAgg(AggState *node)
 		node->table_filled = false;
 		/* iterator will be reset when the table is filled */
 
-		hashagg_recompile_expressions(node, false, false);
+		agg_recompile_expressions(node, false, false);
 	}
 
-	if (node->aggstrategy != AGG_HASHED)
+	if (node->aggstrategy == AGG_INDEX)
+	{
+		indexagg_reset_spill_state(node);
+
+		node->spill_ever_happened = false;
+		node->spill_mode = false;
+		node->spill_ngroups_current = 0;
+		
+		ReScanExprContext(node->indexcontext);
+		MemoryContextReset(node->index_entrycxt);
+		MemoryContextReset(node->index_nodecxt);
+
+		build_index(node);
+		node->index_filled = false;
+
+		agg_recompile_expressions(node, false, false);
+
+		if (node->mergestate)
+		{
+			tuplesort_end(node->mergestate);
+			node->mergestate = NULL;
+		}
+	}
+	else if (node->aggstrategy != AGG_HASHED)
 	{
 		/*
 		 * Reset the per-group state (in particular, mark transvalues null)
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 88ae529e843..fc349707778 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1900,6 +1900,7 @@ static void
 inittapestate(Tuplesortstate *state, int maxTapes)
 {
 	int64		tapeSpace;
+	Size		memtuplesSize;
 
 	/*
 	 * Decrease availMem to reflect the space needed for tape buffers; but
@@ -1912,7 +1913,16 @@ inittapestate(Tuplesortstate *state, int maxTapes)
 	 */
 	tapeSpace = (int64) maxTapes * TAPE_BUFFER_OVERHEAD;
 
-	if (tapeSpace + GetMemoryChunkSpace(state->memtuples) < state->allowedMem)
+	/* 
+	 * In merge state during initial run creation we do not use in-memory
+	 * tuples array and write to tapes directly.
+	 */
+	if (state->memtuples != NULL)
+		memtuplesSize = GetMemoryChunkSpace(state->memtuples);
+	else
+		memtuplesSize = 0;
+
+	if (tapeSpace + memtuplesSize < state->allowedMem)
 		USEMEM(state, tapeSpace);
 
 	/*
@@ -2031,11 +2041,14 @@ mergeruns(Tuplesortstate *state)
 
 	/*
 	 * We no longer need a large memtuples array.  (We will allocate a smaller
-	 * one for the heap later.)
+	 * one for the heap later.)  Note that in merge state this array can be NULL.
 	 */
-	FREEMEM(state, GetMemoryChunkSpace(state->memtuples));
-	pfree(state->memtuples);
-	state->memtuples = NULL;
+	if (state->memtuples)
+	{
+		FREEMEM(state, GetMemoryChunkSpace(state->memtuples));
+		pfree(state->memtuples);
+		state->memtuples = NULL;
+	}
 
 	/*
 	 * Initialize the slab allocator.  We need one slab slot per input tape,
@@ -3157,3 +3170,189 @@ ssup_datum_int32_cmp(Datum x, Datum y, SortSupport ssup)
 	else
 		return 0;
 }
+
+/* 
+ *    tuplemerge_begin_common
+ * 
+ * Create new Tuplesortstate for performing merge only. This is used when
+ * we know, that input is sorted, but stored in multiple tapes, so only
+ * have to perform merge.
+ * 
+ * Unlike tuplesort_begin_common it does not accept sortopt, because none
+ * of current options are supported by merge (random access and bounded sort).
+ */
+Tuplesortstate *
+tuplemerge_begin_common(int workMem, SortCoordinate coordinate)
+{
+	Tuplesortstate *state;
+	MemoryContext maincontext;
+	MemoryContext sortcontext;
+	MemoryContext oldcontext;
+
+	/*
+	 * Memory context surviving tuplesort_reset.  This memory context holds
+	 * data which is useful to keep while sorting multiple similar batches.
+	 */
+	maincontext = AllocSetContextCreate(CurrentMemoryContext,
+										"TupleMerge main",
+										ALLOCSET_DEFAULT_SIZES);
+
+	/*
+	 * Create a working memory context for one sort operation.  The content of
+	 * this context is deleted by tuplesort_reset.
+	 */
+	sortcontext = AllocSetContextCreate(maincontext,
+										"TupleMerge merge",
+										ALLOCSET_DEFAULT_SIZES);
+
+	/*
+	 * Make the Tuplesortstate within the per-sortstate context.  This way, we
+	 * don't need a separate pfree() operation for it at shutdown.
+	 */
+	oldcontext = MemoryContextSwitchTo(maincontext);
+
+	state = (Tuplesortstate *) palloc0(sizeof(Tuplesortstate));
+
+	if (trace_sort)
+		pg_rusage_init(&state->ru_start);
+
+	state->base.sortopt = TUPLESORT_NONE;
+	state->base.tuples = true;
+	state->abbrevNext = 10;
+
+	/*
+	 * workMem is forced to be at least 64KB, the current minimum valid value
+	 * for the work_mem GUC.  This is a defense against parallel sort callers
+	 * that divide out memory among many workers in a way that leaves each
+	 * with very little memory.
+	 */
+	state->allowedMem = Max(workMem, 64) * (int64) 1024;
+	state->base.sortcontext = sortcontext;
+	state->base.maincontext = maincontext;
+
+	/*
+	 * After all of the other non-parallel-related state, we setup all of the
+	 * state needed for each batch.
+	 */
+
+	/* 
+	 * Merging do not accept RANDOMACCESS, so only possible context is Bump,
+	 * which saves some cycles.
+	 */
+	state->base.tuplecontext = BumpContextCreate(state->base.sortcontext,
+												 "Caller tuples",
+												 ALLOCSET_DEFAULT_SIZES);
+	
+	state->status = TSS_BUILDRUNS;
+	state->bounded = false;
+	state->boundUsed = false;
+	state->availMem = state->allowedMem;
+	
+	/* 
+	 * When performing merge we do not need in-memory array for sorting.
+	 * Even if we do not use memtuples, still allocate it, but make it empty.
+	 * So if someone will invoke inappropriate function in merge mode we will
+	 * not fail.
+	 */
+	state->memtuples = NULL;
+	state->memtupcount = 0;
+	state->memtupsize = INITIAL_MEMTUPSIZE;
+	state->growmemtuples = true;
+	state->slabAllocatorUsed = false;
+
+	/*
+	 * Tape variables (inputTapes, outputTapes, etc.) will be initialized by
+	 * inittapes(), if needed.
+	 */
+	state->result_tape = NULL;	/* flag that result tape has not been formed */
+	state->tapeset = NULL;
+	
+	inittapes(state, true);
+
+	/*
+	 * Initialize parallel-related state based on coordination information
+	 * from caller
+	 */
+	if (!coordinate)
+	{
+		/* Serial sort */
+		state->shared = NULL;
+		state->worker = -1;
+		state->nParticipants = -1;
+	}
+	else if (coordinate->isWorker)
+	{
+		/* Parallel worker produces exactly one final run from all input */
+		state->shared = coordinate->sharedsort;
+		state->worker = worker_get_identifier(state);
+		state->nParticipants = -1;
+	}
+	else
+	{
+		/* Parallel leader state only used for final merge */
+		state->shared = coordinate->sharedsort;
+		state->worker = -1;
+		state->nParticipants = coordinate->nParticipants;
+		Assert(state->nParticipants >= 1);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+void
+tuplemerge_start_run(Tuplesortstate *state)
+{
+	if (state->memtupcount == 0)
+		return;
+
+	selectnewtape(state);
+	state->memtupcount = 0;
+}
+
+void
+tuplemerge_performmerge(Tuplesortstate *state)
+{
+	if (state->memtupcount == 0)
+	{
+		/* 
+		 * We have started new run, but no tuples were written. mergeruns
+		 * expects that each run have at least 1 tuple, otherwise it
+		 * will fail to even fill initial merge heap.
+		 */
+		state->nOutputRuns--;
+	}
+	else
+		state->memtupcount = 0;
+
+	mergeruns(state);
+
+	state->current = 0;
+	state->eof_reached = false;
+	state->markpos_block = 0L;
+	state->markpos_offset = 0;
+	state->markpos_eof = false;
+}
+
+void
+tuplemerge_puttuple_common(Tuplesortstate *state, SortTuple *tuple, Size tuplen)
+{
+	MemoryContext oldcxt = MemoryContextSwitchTo(state->base.sortcontext);
+
+	Assert(state->destTape);	
+	WRITETUP(state, state->destTape, tuple);
+
+	MemoryContextSwitchTo(oldcxt);
+	
+	state->memtupcount++;
+}
+
+void
+tuplemerge_end_run(Tuplesortstate *state)
+{
+	if (state->memtupcount != 0)
+	{
+		markrunend(state->destTape);
+	}
+}
diff --git a/src/backend/utils/sort/tuplesortvariants.c b/src/backend/utils/sort/tuplesortvariants.c
index a1f5c19ee97..96cc66900fa 100644
--- a/src/backend/utils/sort/tuplesortvariants.c
+++ b/src/backend/utils/sort/tuplesortvariants.c
@@ -2070,3 +2070,108 @@ readtup_datum(Tuplesortstate *state, SortTuple *stup,
 	if (base->sortopt & TUPLESORT_RANDOMACCESS) /* need trailing length word? */
 		LogicalTapeReadExact(tape, &tuplen, sizeof(tuplen));
 }
+
+Tuplesortstate *
+tuplemerge_begin_heap(TupleDesc tupDesc,
+					  int nkeys, AttrNumber *attNums,
+					  Oid *sortOperators, Oid *sortCollations,
+					  bool *nullsFirstFlags,
+					  int workMem, SortCoordinate coordinate)
+{
+	Tuplesortstate *state = tuplemerge_begin_common(workMem, coordinate);
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	MemoryContext oldcontext;
+	int			i;
+
+	oldcontext = MemoryContextSwitchTo(base->maincontext);
+
+	Assert(nkeys > 0);
+
+	if (trace_sort)
+		elog(LOG,
+			 "begin tuple merge: nkeys = %d, workMem = %d", nkeys, workMem);
+
+	base->nKeys = nkeys;
+
+	TRACE_POSTGRESQL_SORT_START(HEAP_SORT,
+								false,	/* no unique check */
+								nkeys,
+								workMem,
+								false,
+								PARALLEL_SORT(coordinate));
+
+	base->removeabbrev = removeabbrev_heap;
+	base->comparetup = comparetup_heap;
+	base->comparetup_tiebreak = comparetup_heap_tiebreak;
+	base->writetup = writetup_heap;
+	base->readtup = readtup_heap;
+	base->haveDatum1 = true;
+	base->arg = tupDesc;		/* assume we need not copy tupDesc */
+
+	/* Prepare SortSupport data for each column */
+	base->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		SortSupport sortKey = base->sortKeys + i;
+
+		Assert(attNums[i] != 0);
+		Assert(sortOperators[i] != 0);
+
+		sortKey->ssup_cxt = CurrentMemoryContext;
+		sortKey->ssup_collation = sortCollations[i];
+		sortKey->ssup_nulls_first = nullsFirstFlags[i];
+		sortKey->ssup_attno = attNums[i];
+		/* Convey if abbreviation optimization is applicable in principle */
+		sortKey->abbreviate = (i == 0 && base->haveDatum1);
+
+		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
+	}
+
+	/*
+	 * The "onlyKey" optimization cannot be used with abbreviated keys, since
+	 * tie-breaker comparisons may be required.  Typically, the optimization
+	 * is only of value to pass-by-value types anyway, whereas abbreviated
+	 * keys are typically only of value to pass-by-reference types.
+	 */
+	if (nkeys == 1 && !base->sortKeys->abbrev_converter)
+		base->onlyKey = base->sortKeys;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+void
+tuplemerge_puttupleslot(Tuplesortstate *state, TupleTableSlot *slot)
+{
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	MemoryContext oldcontext = MemoryContextSwitchTo(base->tuplecontext);
+	TupleDesc	tupDesc = (TupleDesc) base->arg;
+	SortTuple	stup;
+	MinimalTuple tuple;
+	HeapTupleData htup;
+	Size		tuplen;
+
+	/* copy the tuple into sort storage */
+	tuple = ExecCopySlotMinimalTuple(slot);
+	stup.tuple = tuple;
+	/* set up first-column key value */
+	htup.t_len = tuple->t_len + MINIMAL_TUPLE_OFFSET;
+	htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET);
+	stup.datum1 = heap_getattr(&htup,
+							   base->sortKeys[0].ssup_attno,
+							   tupDesc,
+							   &stup.isnull1);
+
+	/* GetMemoryChunkSpace is not supported for bump contexts */
+	if (TupleSortUseBumpTupleCxt(base->sortopt))
+		tuplen = MAXALIGN(tuple->t_len);
+	else
+		tuplen = GetMemoryChunkSpace(tuple);
+
+	tuplemerge_puttuple_common(state, &stup, tuplen);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 90c8ba7c779..57e53d94a17 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -393,8 +393,16 @@ extern ExprState *ExecInitExprWithParams(Expr *node, ParamListInfo ext_params);
 extern ExprState *ExecInitQual(List *qual, PlanState *parent);
 extern ExprState *ExecInitCheck(List *qual, PlanState *parent);
 extern List *ExecInitExprList(List *nodes, PlanState *parent);
+
+/* 
+ * Which strategy to use for aggregation/grouping
+ */
+#define GROUPING_STRATEGY_SORT			1
+#define GROUPING_STRATEGY_HASH			(1 << 1)
+#define GROUPING_STRATEGY_INDEX			(1 << 2)
+
 extern ExprState *ExecBuildAggTrans(AggState *aggstate, struct AggStatePerPhaseData *phase,
-									bool doSort, bool doHash, bool nullcheck);
+									int groupStrategy, bool nullcheck);
 extern ExprState *ExecBuildHash32FromAttrs(TupleDesc desc,
 										   const TupleTableSlotOps *ops,
 										   FmgrInfo *hashfunctions,
diff --git a/src/include/executor/nodeAgg.h b/src/include/executor/nodeAgg.h
index 6c4891bbaeb..8361d000878 100644
--- a/src/include/executor/nodeAgg.h
+++ b/src/include/executor/nodeAgg.h
@@ -321,6 +321,33 @@ typedef struct AggStatePerHashData
 	Agg		   *aggnode;		/* original Agg node, for numGroups etc. */
 }			AggStatePerHashData;
 
+/* 
+ * AggStatePerIndexData - per-index state
+ *
+ * Logic is the same as for AggStatePerHashData - one of these for each
+ * grouping set.
+ */
+typedef struct AggStatePerIndexData
+{
+	TupleIndex	index;			/* current in-memory index data */
+	MemoryContext metacxt;		/* memory context containing TupleIndex */
+	MemoryContext tempctx;		/* short-lived context */
+	TupleTableSlot *indexslot; 	/* slot for loading index */
+	int			numCols;		/* total number of columns in index tuple */
+	int			numKeyCols;		/* number of key columns in index tuple */
+	int			largestGrpColIdx;	/* largest col required for comparison */
+	AttrNumber *idxKeyColIdxInput;	/* key column indices in input slot */
+	AttrNumber *idxKeyColIdxIndex;	/* key column indices in index tuples */
+	TupleIndexIteratorData iter;	/* iterator state for index */
+	Agg		   *aggnode;		/* original Agg node, for numGroups etc. */	
+
+	/* state used only for spill mode */
+	AttrNumber	*idxKeyColIdxTL;	/* key column indices in target list */
+	FmgrInfo    *hashfunctions;	/* tuple hashing function */
+	ExprState   *indexhashexpr;	/* ExprState for hashing index datatype(s) */
+	ExprContext *exprcontext;	/* expression context */
+	TupleTableSlot *mergeslot;	/* slot for loading tuple during merge */
+}			AggStatePerIndexData;
 
 extern AggState *ExecInitAgg(Agg *node, EState *estate, int eflags);
 extern void ExecEndAgg(AggState *node);
@@ -328,9 +355,9 @@ extern void ExecReScanAgg(AggState *node);
 
 extern Size hash_agg_entry_size(int numTrans, Size tupleWidth,
 								Size transitionSpace);
-extern void hash_agg_set_limits(double hashentrysize, double input_groups,
-								int used_bits, Size *mem_limit,
-								uint64 *ngroups_limit, int *num_partitions);
+extern void agg_set_limits(double hashentrysize, double input_groups,
+						   int used_bits, Size *mem_limit,
+						   uint64 *ngroups_limit, int *num_partitions);
 
 /* parallel instrumentation support */
 extern void ExecAggEstimate(AggState *node, ParallelContext *pcxt);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index c9b69a96b26..27e6afcd32c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2613,6 +2613,7 @@ typedef struct AggStatePerTransData *AggStatePerTrans;
 typedef struct AggStatePerGroupData *AggStatePerGroup;
 typedef struct AggStatePerPhaseData *AggStatePerPhase;
 typedef struct AggStatePerHashData *AggStatePerHash;
+typedef struct AggStatePerIndexData *AggStatePerIndex;
 
 typedef struct AggState
 {
@@ -2628,17 +2629,18 @@ typedef struct AggState
 	AggStatePerAgg peragg;		/* per-Aggref information */
 	AggStatePerTrans pertrans;	/* per-Trans state information */
 	ExprContext *hashcontext;	/* econtexts for long-lived data (hashtable) */
+	ExprContext *indexcontext;	/* econtexts for long-lived data (index) */
 	ExprContext **aggcontexts;	/* econtexts for long-lived data (per GS) */
 	ExprContext *tmpcontext;	/* econtext for input expressions */
-#define FIELDNO_AGGSTATE_CURAGGCONTEXT 14
+#define FIELDNO_AGGSTATE_CURAGGCONTEXT 15
 	ExprContext *curaggcontext; /* currently active aggcontext */
 	AggStatePerAgg curperagg;	/* currently active aggregate, if any */
-#define FIELDNO_AGGSTATE_CURPERTRANS 16
+#define FIELDNO_AGGSTATE_CURPERTRANS 17
 	AggStatePerTrans curpertrans;	/* currently active trans state, if any */
 	bool		input_done;		/* indicates end of input */
 	bool		agg_done;		/* indicates completion of Agg scan */
 	int			projected_set;	/* The last projected grouping set */
-#define FIELDNO_AGGSTATE_CURRENT_SET 20
+#define FIELDNO_AGGSTATE_CURRENT_SET 21
 	int			current_set;	/* The current grouping set being evaluated */
 	Bitmapset  *grouped_cols;	/* grouped cols in current projection */
 	List	   *all_grouped_cols;	/* list of all grouped cols in DESC order */
@@ -2660,32 +2662,43 @@ typedef struct AggState
 	int			num_hashes;
 	MemoryContext hash_metacxt; /* memory for hash table bucket array */
 	MemoryContext hash_tuplescxt;	/* memory for hash table tuples */
-	struct LogicalTapeSet *hash_tapeset;	/* tape set for hash spill tapes */
-	struct HashAggSpill *hash_spills;	/* HashAggSpill for each grouping set,
-										 * exists only during first pass */
-	TupleTableSlot *hash_spill_rslot;	/* for reading spill files */
-	TupleTableSlot *hash_spill_wslot;	/* for writing spill files */
-	List	   *hash_batches;	/* hash batches remaining to be processed */
-	bool		hash_ever_spilled;	/* ever spilled during this execution? */
-	bool		hash_spill_mode;	/* we hit a limit during the current batch
-									 * and we must not create new groups */
-	Size		hash_mem_limit; /* limit before spilling hash table */
-	uint64		hash_ngroups_limit; /* limit before spilling hash table */
-	int			hash_planned_partitions;	/* number of partitions planned
-											 * for first pass */
-	double		hashentrysize;	/* estimate revised during execution */
-	Size		hash_mem_peak;	/* peak hash table memory usage */
-	uint64		hash_ngroups_current;	/* number of groups currently in
-										 * memory in all hash tables */
-	uint64		hash_disk_used; /* kB of disk space used */
-	int			hash_batches_used;	/* batches used during entire execution */
-
 	AggStatePerHash perhash;	/* array of per-hashtable data */
 	AggStatePerGroup *hash_pergroup;	/* grouping set indexed array of
 										 * per-group pointers */
+	/* Fields used for managing spill mode in hash and index aggs */
+	struct LogicalTapeSet *spill_tapeset;	/* tape set for hash spill tapes */
+	struct HashAggSpill *spills;	/* HashAggSpill for each grouping set,
+									 * exists only during first pass */
+	TupleTableSlot *spill_rslot;	/* for reading spill files */
+	TupleTableSlot *spill_wslot;	/* for writing spill files */
+	List	   *spill_batches;	/* hash batches remaining to be processed */
+
+	bool		spill_ever_happened;	/* ever spilled during this execution? */
+	bool		spill_mode;	/* we hit a limit during the current batch
+							 * and we must not create new groups */
+	Size		spill_mem_limit; /* limit before spilling hash table or index */
+	uint64		spill_ngroups_limit; /* limit before spilling hash table or index */
+	int			spill_planned_partitions;	/* number of partitions planned
+											 * for first pass */
+	double		hashentrysize;	/* estimate revised during execution */
+	Size		spill_mem_peak;	/* peak memory usage of hash table or index */
+	uint64		spill_ngroups_current;	/* number of groups currently in
+										 * memory in all hash tables */
+	uint64		spill_disk_used; /* kB of disk space used */
+	int			spill_batches_used;	/* batches used during entire execution */
+
+	/* these fields are used in AGG_INDEXED mode: */
+	AggStatePerIndex perindex;	/* pointer to per-index state data */
+	bool			index_filled;	/* index filled yet? */
+	MemoryContext	index_metacxt;	/* memory for index structure */
+	MemoryContext	index_nodecxt;	/* memory for index nodes */
+	MemoryContext	index_entrycxt;	/* memory for index entries */
+	Sort		   *index_sort;		/* ordering information for index */
+	Tuplesortstate *mergestate;		/* state for merging projected tuples if
+									 * spill occurred */
 
 	/* support for evaluation of agg input expressions: */
-#define FIELDNO_AGGSTATE_ALL_PERGROUPS 54
+#define FIELDNO_AGGSTATE_ALL_PERGROUPS 62
 	AggStatePerGroup *all_pergroups;	/* array of first ->pergroups, than
 										 * ->hash_pergroup */
 	SharedAggInfo *shared_info; /* one entry per worker */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fb3957e75e5..b0e2d781c01 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -365,6 +365,7 @@ typedef enum AggStrategy
 	AGG_SORTED,					/* grouped agg, input must be sorted */
 	AGG_HASHED,					/* grouped agg, use internal hashtable */
 	AGG_MIXED,					/* grouped agg, hash and sort both used */
+	AGG_INDEX,					/* grouped agg, build index for input */
 } AggStrategy;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..b19dacf5de4 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1219,7 +1219,7 @@ typedef struct Agg
 	/* grouping sets to use */
 	List	   *groupingSets;
 
-	/* chained Agg/Sort nodes */
+	/* chained Agg/Sort nodes, for AGG_INDEX contains single Sort node */
 	List	   *chain;
 } Agg;
 
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 0bf55902aa1..f372c3e7e0a 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -475,6 +475,21 @@ extern GinTuple *tuplesort_getgintuple(Tuplesortstate *state, Size *len,
 									   bool forward);
 extern bool tuplesort_getdatum(Tuplesortstate *state, bool forward, bool copy,
 							   Datum *val, bool *isNull, Datum *abbrev);
-
+/* 
+* Special state for merge mode.
+*/
+extern Tuplesortstate *tuplemerge_begin_common(int workMem,
+											   SortCoordinate coordinate);
+extern Tuplesortstate *tuplemerge_begin_heap(TupleDesc tupDesc,
+											int nkeys, AttrNumber *attNums,
+											Oid *sortOperators, Oid *sortCollations,
+											bool *nullsFirstFlags,
+											int workMem, SortCoordinate coordinate);
+extern void tuplemerge_start_run(Tuplesortstate *state);
+extern void tuplemerge_end_run(Tuplesortstate *state);
+extern void tuplemerge_puttuple_common(Tuplesortstate *state, SortTuple *tuple,
+									   Size tuplen);
+extern void tuplemerge_puttupleslot(Tuplesortstate *state, TupleTableSlot *slot);
+extern void tuplemerge_performmerge(Tuplesortstate *state);
 
 #endif							/* TUPLESORT_H */
-- 
2.43.0

v3-0003-make-use-of-IndexAggregate-in-planner-and-explain.patchtext/x-patch; charset=UTF-8; name=v3-0003-make-use-of-IndexAggregate-in-planner-and-explain.patchDownload

From f427c3afa16b294523a82fc67decf0fffe6f8180 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 17:34:18 +0300
Subject: [PATCH v3 3/5] make use of IndexAggregate in planner and explain

This commit adds usage of IndexAggregate in planner and explain (analyze).

We calculate cost of IndexAggregate and add AGG_INDEX node to the pathlist.
Cost of this node is cost of building B+tree (in memory), disk spill and
final external merge.

For EXPLAIN there is only little change - show sort information in "Group Key".
---
 src/backend/commands/explain.c                | 101 +++++++++++--
 src/backend/optimizer/path/costsize.c         | 137 +++++++++++++-----
 src/backend/optimizer/plan/createplan.c       |  15 +-
 src/backend/optimizer/plan/planner.c          |  35 +++++
 src/backend/optimizer/util/pathnode.c         |   9 ++
 src/backend/utils/misc/guc_parameters.dat     |   7 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h                 |   3 +-
 src/include/optimizer/cost.h                  |   1 +
 9 files changed, 251 insertions(+), 58 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5a6390631eb..9e16c547b06 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -134,7 +134,7 @@ static void show_recursive_union_info(RecursiveUnionState *rstate,
 									  ExplainState *es);
 static void show_memoize_info(MemoizeState *mstate, List *ancestors,
 							  ExplainState *es);
-static void show_hashagg_info(AggState *aggstate, ExplainState *es);
+static void show_agg_spill_info(AggState *aggstate, ExplainState *es);
 static void show_indexsearches_info(PlanState *planstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1556,6 +1556,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 						pname = "MixedAggregate";
 						strategy = "Mixed";
 						break;
+					case AGG_INDEX:
+						pname = "IndexAggregate";
+						strategy = "Indexed";
+						break;
 					default:
 						pname = "Aggregate ???";
 						strategy = "???";
@@ -2200,7 +2204,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Agg:
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
-			show_hashagg_info((AggState *) planstate, es);
+			show_agg_spill_info((AggState *) planstate, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
@@ -2631,6 +2635,24 @@ show_agg_keys(AggState *astate, List *ancestors,
 
 		if (plan->groupingSets)
 			show_grouping_sets(outerPlanState(astate), plan, ancestors, es);
+		else if (plan->aggstrategy == AGG_INDEX)
+			{
+				Sort	*sort = astate->index_sort;
+
+				/* 
+				 * Index Agg reorders GROUP BY keys to match ORDER BY
+				 * so they must be the same, but we should show other
+				 * useful information about used ordering, such as direction.
+				 */
+				Assert(sort != NULL);
+				show_sort_group_keys(outerPlanState(astate), "Group Key",
+									 plan->numCols, 0,
+									 sort->sortColIdx,
+									 sort->sortOperators,
+									 sort->collations,
+									 sort->nullsFirst,
+									 ancestors, es);
+			}
 		else
 			show_sort_group_keys(outerPlanState(astate), "Group Key",
 								 plan->numCols, 0, plan->grpColIdx,
@@ -3735,47 +3757,67 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es)
 }
 
 /*
- * Show information on hash aggregate memory usage and batches.
+ * Show information on hash or index aggregate memory usage and batches.
  */
 static void
-show_hashagg_info(AggState *aggstate, ExplainState *es)
+show_agg_spill_info(AggState *aggstate, ExplainState *es)
 {
 	Agg		   *agg = (Agg *) aggstate->ss.ps.plan;
-	int64		memPeakKb = BYTES_TO_KILOBYTES(aggstate->hash_mem_peak);
+	int64		memPeakKb = BYTES_TO_KILOBYTES(aggstate->spill_mem_peak);
 
 	if (agg->aggstrategy != AGG_HASHED &&
-		agg->aggstrategy != AGG_MIXED)
+		agg->aggstrategy != AGG_MIXED &&
+		agg->aggstrategy != AGG_INDEX)
 		return;
 
 	if (es->format != EXPLAIN_FORMAT_TEXT)
 	{
 		if (es->costs)
 			ExplainPropertyInteger("Planned Partitions", NULL,
-								   aggstate->hash_planned_partitions, es);
+								   aggstate->spill_planned_partitions, es);
 
 		/*
 		 * During parallel query the leader may have not helped out.  We
 		 * detect this by checking how much memory it used.  If we find it
 		 * didn't do any work then we don't show its properties.
 		 */
-		if (es->analyze && aggstate->hash_mem_peak > 0)
+		if (es->analyze && aggstate->spill_mem_peak > 0)
 		{
 			ExplainPropertyInteger("HashAgg Batches", NULL,
-								   aggstate->hash_batches_used, es);
+								   aggstate->spill_batches_used, es);
 			ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
 			ExplainPropertyInteger("Disk Usage", "kB",
-								   aggstate->hash_disk_used, es);
+								   aggstate->spill_disk_used, es);
+		}
+
+		if (   es->analyze
+			&& aggstate->aggstrategy == AGG_INDEX
+			&& aggstate->mergestate != NULL)
+		{
+			TuplesortInstrumentation stats;
+			const char *mergeMethod;
+			const char *spaceType;
+			int64 spaceUsed;
+			
+			tuplesort_get_stats(aggstate->mergestate, &stats);
+			mergeMethod = tuplesort_method_name(stats.sortMethod);
+			spaceType = tuplesort_space_type_name(stats.spaceType);
+			spaceUsed = stats.spaceUsed;
+
+			ExplainPropertyText("Merge Method", mergeMethod, es);
+			ExplainPropertyInteger("Merge Space Used", "kB", spaceUsed, es);
+			ExplainPropertyText("Merge Space Type", spaceType, es);
 		}
 	}
 	else
 	{
 		bool		gotone = false;
 
-		if (es->costs && aggstate->hash_planned_partitions > 0)
+		if (es->costs && aggstate->spill_planned_partitions > 0)
 		{
 			ExplainIndentText(es);
 			appendStringInfo(es->str, "Planned Partitions: %d",
-							 aggstate->hash_planned_partitions);
+							 aggstate->spill_planned_partitions);
 			gotone = true;
 		}
 
@@ -3784,7 +3826,7 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 		 * detect this by checking how much memory it used.  If we find it
 		 * didn't do any work then we don't show its properties.
 		 */
-		if (es->analyze && aggstate->hash_mem_peak > 0)
+		if (es->analyze && aggstate->spill_mem_peak > 0)
 		{
 			if (!gotone)
 				ExplainIndentText(es);
@@ -3792,17 +3834,44 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 				appendStringInfoSpaces(es->str, 2);
 
 			appendStringInfo(es->str, "Batches: %d  Memory Usage: " INT64_FORMAT "kB",
-							 aggstate->hash_batches_used, memPeakKb);
+							 aggstate->spill_batches_used, memPeakKb);
 			gotone = true;
 
 			/* Only display disk usage if we spilled to disk */
-			if (aggstate->hash_batches_used > 1)
+			if (aggstate->spill_batches_used > 1)
 			{
 				appendStringInfo(es->str, "  Disk Usage: " UINT64_FORMAT "kB",
-								 aggstate->hash_disk_used);
+								 aggstate->spill_disk_used);
 			}
 		}
 
+		/* For index aggregate show stats for final merging */
+		if (   es->analyze
+			&& aggstate->aggstrategy == AGG_INDEX
+			&& aggstate->mergestate != NULL)
+		{
+			TuplesortInstrumentation stats;
+			const char *mergeMethod;
+			const char *spaceType;
+			int64 spaceUsed;
+			
+			tuplesort_get_stats(aggstate->mergestate, &stats);
+			mergeMethod = tuplesort_method_name(stats.sortMethod);
+			spaceType = tuplesort_space_type_name(stats.spaceType);
+			spaceUsed = stats.spaceUsed;
+
+			/* 
+			 * If we are here that means that previous check (for mem peak) was
+			 * successfull (can not directly go to merge without any in-memory
+			 * operations).  Do not check other state and just start a new line.
+			 */
+			appendStringInfoChar(es->str, '\n');
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Merge Method: %s  %s: " INT64_FORMAT "kB",
+							 mergeMethod, spaceType, spaceUsed);
+			gotone = true;
+		}
+
 		if (gotone)
 			appendStringInfoChar(es->str, '\n');
 	}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index a39cc793b4d..a966fb76113 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -150,6 +150,7 @@ bool		enable_tidscan = true;
 bool		enable_sort = true;
 bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
+bool		enable_indexagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
 bool		enable_memoize = true;
@@ -1848,6 +1849,32 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
 									rterm->pathtarget->width);
 }
 
+/* 
+ * cost_tuplemerge
+ *		Determines and returns the cost of external merge used in tuplesort.
+ */
+static void
+cost_tuplemerge(double availMem, double input_bytes, double ntuples,
+				Cost comparison_cost, Cost *cost)
+{
+	double		npages = ceil(input_bytes / BLCKSZ);
+	double		nruns = input_bytes / availMem;
+	double		mergeorder = tuplesort_merge_order(availMem);
+	double		log_runs;
+	double		npageaccesses;
+
+	/* Compute logM(r) as log(r) / log(M) */
+	if (nruns > mergeorder)
+		log_runs = ceil(log(nruns) / log(mergeorder));
+	else
+		log_runs = 1.0;
+
+	npageaccesses = 2.0 * npages * log_runs;
+
+	/* Assume 3/4ths of accesses are sequential, 1/4th are not */
+	*cost += npageaccesses * (seq_page_cost * 0.75 + random_page_cost * 0.25);
+}
+
 /*
  * cost_tuplesort
  *	  Determines and returns the cost of sorting a relation using tuplesort,
@@ -1922,11 +1949,6 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		/*
 		 * We'll have to use a disk-based sort of all the tuples
 		 */
-		double		npages = ceil(input_bytes / BLCKSZ);
-		double		nruns = input_bytes / sort_mem_bytes;
-		double		mergeorder = tuplesort_merge_order(sort_mem_bytes);
-		double		log_runs;
-		double		npageaccesses;
 
 		/*
 		 * CPU costs
@@ -1936,16 +1958,8 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		*startup_cost = comparison_cost * tuples * LOG2(tuples);
 
 		/* Disk costs */
-
-		/* Compute logM(r) as log(r) / log(M) */
-		if (nruns > mergeorder)
-			log_runs = ceil(log(nruns) / log(mergeorder));
-		else
-			log_runs = 1.0;
-		npageaccesses = 2.0 * npages * log_runs;
-		/* Assume 3/4ths of accesses are sequential, 1/4th are not */
-		*startup_cost += npageaccesses *
-			(seq_page_cost * 0.75 + random_page_cost * 0.25);
+		cost_tuplemerge(sort_mem_bytes, input_bytes, tuples, comparison_cost,
+						startup_cost);
 	}
 	else if (tuples > 2 * output_tuples || input_bytes > sort_mem_bytes)
 	{
@@ -2770,7 +2784,7 @@ cost_agg(Path *path, PlannerInfo *root,
 		total_cost += cpu_tuple_cost * numGroups;
 		output_tuples = numGroups;
 	}
-	else
+	else if (aggstrategy == AGG_HASHED)
 	{
 		/* must be AGG_HASHED */
 		startup_cost = input_total_cost;
@@ -2788,6 +2802,50 @@ cost_agg(Path *path, PlannerInfo *root,
 		total_cost += cpu_tuple_cost * numGroups;
 		output_tuples = numGroups;
 	}
+	else
+	{
+		/* must be AGG_INDEX */
+		startup_cost = input_total_cost;
+		if (!enable_indexagg)
+			++disabled_nodes;
+
+		/* these matches AGG_HASHED */
+		startup_cost += aggcosts->transCost.startup;
+		startup_cost += aggcosts->transCost.per_tuple * input_tuples;
+		startup_cost += (cpu_operator_cost * numGroupCols) * input_tuples;
+		startup_cost += aggcosts->finalCost.startup;
+
+		/* cost of btree top-down traversal */
+		startup_cost +=   LOG2(numGroups)	/* amount of comparisons */
+						* (2.0 * cpu_operator_cost)	/* comparison cost */
+						* input_tuples;
+
+		total_cost = startup_cost;
+		total_cost += aggcosts->finalCost.per_tuple * numGroups;
+		total_cost += cpu_tuple_cost * numGroups;
+		output_tuples = numGroups;
+	}
+
+	/*
+	 * If there are quals (HAVING quals), account for their cost and
+	 * selectivity.  Process it before disk spill logic, because output
+	 * cardinality is required for AGG_INDEX.
+	 */
+	if (quals)
+	{
+		QualCost	qual_cost;
+
+		cost_qual_eval(&qual_cost, quals, root);
+		startup_cost += qual_cost.startup;
+		total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple;
+
+		output_tuples = clamp_row_est(output_tuples *
+									  clauselist_selectivity(root,
+															 quals,
+															 0,
+															 JOIN_INNER,
+															 NULL));
+	}
 
 	/*
 	 * Add the disk costs of hash aggregation that spills to disk.
@@ -2802,7 +2860,7 @@ cost_agg(Path *path, PlannerInfo *root,
 	 * Accrue writes (spilled tuples) to startup_cost and to total_cost;
 	 * accrue reads only to total_cost.
 	 */
-	if (aggstrategy == AGG_HASHED || aggstrategy == AGG_MIXED)
+	if (aggstrategy == AGG_HASHED || aggstrategy == AGG_MIXED || aggstrategy == AGG_INDEX)
 	{
 		double		pages;
 		double		pages_written = 0.0;
@@ -2814,6 +2872,7 @@ cost_agg(Path *path, PlannerInfo *root,
 		uint64		ngroups_limit;
 		int			num_partitions;
 		int			depth;
+		bool		canspill;
 
 		/*
 		 * Estimate number of batches based on the computed limits. If less
@@ -2823,8 +2882,9 @@ cost_agg(Path *path, PlannerInfo *root,
 		hashentrysize = hash_agg_entry_size(list_length(root->aggtransinfos),
 											input_width,
 											aggcosts->transitionSpace);
-		hash_agg_set_limits(hashentrysize, numGroups, 0, &mem_limit,
-							&ngroups_limit, &num_partitions);
+		agg_set_limits(hashentrysize, numGroups, 0, &mem_limit,
+					   &ngroups_limit, &num_partitions);
+		canspill = num_partitions != 0;
 
 		nbatches = Max((numGroups * hashentrysize) / mem_limit,
 					   numGroups / ngroups_limit);
@@ -2861,26 +2921,27 @@ cost_agg(Path *path, PlannerInfo *root,
 		spill_cost = depth * input_tuples * 2.0 * cpu_tuple_cost;
 		startup_cost += spill_cost;
 		total_cost += spill_cost;
-	}
-
-	/*
-	 * If there are quals (HAVING quals), account for their cost and
-	 * selectivity.
-	 */
-	if (quals)
-	{
-		QualCost	qual_cost;
 
-		cost_qual_eval(&qual_cost, quals, root);
-		startup_cost += qual_cost.startup;
-		total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple;
-
-		output_tuples = clamp_row_est(output_tuples *
-									  clauselist_selectivity(root,
-															 quals,
-															 0,
-															 JOIN_INNER,
-															 NULL));
+		/* 
+		 * IndexAgg requires final external merge stage, but only if spill
+		 * can occur, otherwise everything processed in memory.
+		 */
+		if (aggstrategy == AGG_INDEX && canspill)
+		{
+			double	output_bytes;
+			Cost	comparison_cost;
+			Cost	merge_cost = 0;
+
+			/* size of all projected tuples */
+			output_bytes = path->pathtarget->width * output_tuples;
+			/* default comparison cost */
+			comparison_cost = 2.0 * cpu_operator_cost;
+
+			cost_tuplemerge(work_mem, output_bytes, output_tuples,
+							comparison_cost, &merge_cost);
+			startup_cost += merge_cost;
+			total_cost += merge_cost;
+		}
 	}
 
 	path->rows = output_tuples;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bc417f93840..de9bb1ef30b 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2158,6 +2158,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
 	Plan	   *subplan;
 	List	   *tlist;
 	List	   *quals;
+	List	   *chain;
+	AttrNumber *grpColIdx;
 
 	/*
 	 * Agg can project, so no need to be terribly picky about child tlist, but
@@ -2169,17 +2171,24 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
 
 	quals = order_qual_clauses(root, best_path->qual);
 
+	grpColIdx = extract_grouping_cols(best_path->groupClause, subplan->targetlist);
+
+	/* For index aggregation we should consider the desired sorting order. */
+	if (best_path->aggstrategy == AGG_INDEX)
+		chain = list_make1(make_sort_from_groupcols(best_path->groupClause, grpColIdx, subplan));
+	else
+		chain = NIL;
+
 	plan = make_agg(tlist, quals,
 					best_path->aggstrategy,
 					best_path->aggsplit,
 					list_length(best_path->groupClause),
-					extract_grouping_cols(best_path->groupClause,
-										  subplan->targetlist),
+					grpColIdx,
 					extract_grouping_ops(best_path->groupClause),
 					extract_grouping_collations(best_path->groupClause,
 												subplan->targetlist),
 					NIL,
-					NIL,
+					chain,
 					best_path->numGroups,
 					best_path->transitionSpace,
 					subplan);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8b22c30559b..cfd2f3ff3a9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3877,6 +3877,21 @@ create_grouping_paths(PlannerInfo *root,
 			 (gd ? gd->any_hashable : grouping_is_hashable(root->processed_groupClause))))
 			flags |= GROUPING_CAN_USE_HASH;
 
+		/* 
+		 * Determine whether we should consider index-based implementation of
+		 * grouping.
+		 * 
+		 * This is more restrictive since it not only must be sortable (for
+		 * purposes of Btree), but also must be hashable, so we can effectively
+		 * spill tuples and later process each batch.
+		 */
+		if (   gd == NULL
+			&& root->numOrderedAggs == 0
+			&& parse->groupClause != NIL
+			&& grouping_is_sortable(root->processed_groupClause)
+			&& grouping_is_hashable(root->processed_groupClause))
+			flags |= GROUPING_CAN_USE_INDEX;
+
 		/*
 		 * Determine whether partial aggregation is possible.
 		 */
@@ -7108,6 +7123,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+	bool		can_index = (extra->flags & GROUPING_CAN_USE_INDEX) != 0;
 	List	   *havingQual = (List *) extra->havingQual;
 	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 	double		dNumGroups = 0;
@@ -7329,6 +7345,25 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		}
 	}
 
+	if (can_index)
+	{
+		/* 
+		 * Generate IndexAgg path.
+		 */
+		Assert(!parse->groupingSets);
+		add_path(grouped_rel, (Path *)
+				 create_agg_path(root,
+								 grouped_rel,
+								 cheapest_path,
+								 grouped_rel->reltarget,
+								 AGG_INDEX,
+								 AGGSPLIT_SIMPLE,
+								 root->processed_groupClause,
+								 havingQual,
+								 agg_costs,
+								 dNumGroups));
+	}
+
 	/*
 	 * When partitionwise aggregate is used, we might have fully aggregated
 	 * paths in the partial pathlist, because add_paths_to_append_rel() will
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b6be4ddbd01..2bac26055a7 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3030,6 +3030,15 @@ create_agg_path(PlannerInfo *root,
 		else
 			pathnode->path.pathkeys = subpath->pathkeys;	/* preserves order */
 	}
+	else if (aggstrategy == AGG_INDEX)
+	{
+		/* 
+		 * When using index aggregation all grouping columns will be used as
+		 * comparator keys, so output is always sorted.
+		 */
+		pathnode->path.pathkeys = make_pathkeys_for_sortclauses(root, groupClause,
+																root->processed_tlist);
+	}
 	else
 		pathnode->path.pathkeys = NIL;	/* output is unordered */
 
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 3b9d8349078..776ccd9e2fd 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -868,6 +868,13 @@
   boot_val => 'true',
 },
 
+{ name => 'enable_indexagg', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+  short_desc => 'Enables the planner\'s use of index aggregation plans.',
+  flags => 'GUC_EXPLAIN',
+  variable => 'enable_indexagg',
+  boot_val => 'true',
+},
+
 { name => 'enable_indexonlyscan', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
   short_desc => 'Enables the planner\'s use of index-only-scan plans.',
   flags => 'GUC_EXPLAIN',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index dc9e2255f8a..307b9ee660d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -410,6 +410,7 @@
 #enable_hashagg = on
 #enable_hashjoin = on
 #enable_incremental_sort = on
+#enable_indexagg = on
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 46a8655621d..f4b2d35b1d9 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -3518,7 +3518,8 @@ typedef struct JoinPathExtraData
  */
 #define GROUPING_CAN_USE_SORT       0x0001
 #define GROUPING_CAN_USE_HASH       0x0002
-#define GROUPING_CAN_PARTIAL_AGG	0x0004
+#define GROUPING_CAN_USE_INDEX		0x0004
+#define GROUPING_CAN_PARTIAL_AGG	0x0008
 
 /*
  * What kind of partitionwise aggregation is in use?
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index b523bcda8f3..5d03b5971bd 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
 extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
+extern PGDLLIMPORT bool enable_indexagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
 extern PGDLLIMPORT bool enable_memoize;
-- 
2.43.0

v3-0004-add-support-for-Partial-IndexAggregate.patchtext/x-patch; charset=UTF-8; name=v3-0004-add-support-for-Partial-IndexAggregate.patchDownload

From 1b78737485daf994949995655fa4898672276c4c Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Thu, 11 Dec 2025 14:30:37 +0300
Subject: [PATCH v3 4/5] add support for Partial IndexAggregate

Now IndexAggregate support partial aggregates. The main problem was with
partial aggregates which creates SortGroupClause for same expression as
in target list, but different sortgroupclause, so make_pathkeys_for_sortclases
failed to find required target list entry and throws ERROR.

To fix this now we explicitly pass pathkeys to create_agg_path (but only
for AGG_INDEX for now), so caller is responsible for searching and
building pathkeys list.
---
 src/backend/optimizer/path/allpaths.c  | 76 ++++++++++++++++++++
 src/backend/optimizer/plan/planner.c   | 98 ++++++++++++++++++++++++--
 src/backend/optimizer/prep/prepunion.c |  2 +
 src/backend/optimizer/util/pathnode.c  | 16 +++--
 src/include/optimizer/pathnode.h       |  1 +
 5 files changed, 185 insertions(+), 8 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4c43fd0b19b..5fcac30af84 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3446,6 +3446,7 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
 	AggClauseCosts agg_costs;
 	bool		can_hash;
 	bool		can_sort;
+	bool		can_index;
 	Path	   *cheapest_total_path = NULL;
 	Path	   *cheapest_partial_path = NULL;
 	double		dNumGroups = 0;
@@ -3498,6 +3499,12 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
 	can_hash = (agg_info->group_clauses != NIL &&
 				grouping_is_hashable(agg_info->group_clauses));
 
+	/* 
+	 * Determine whether we should consider index-based implementations of
+	 * grouping.
+	 */
+	can_index = can_sort && can_hash;
+
 	/*
 	 * Consider whether we should generate partially aggregated non-partial
 	 * paths.  We can only do this if we have a non-partial path.
@@ -3615,6 +3622,7 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
 											AGGSPLIT_INITIAL_SERIAL,
 											agg_info->group_clauses,
 											NIL,
+											NIL,
 											&agg_costs,
 											dNumGroups);
 
@@ -3691,6 +3699,7 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
 											AGGSPLIT_INITIAL_SERIAL,
 											agg_info->group_clauses,
 											NIL,
+											NIL,
 											&agg_costs,
 											dNumPartialGroups);
 
@@ -3727,6 +3736,7 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
 										AGGSPLIT_INITIAL_SERIAL,
 										agg_info->group_clauses,
 										NIL,
+										NIL,
 										&agg_costs,
 										dNumGroups);
 
@@ -3762,6 +3772,72 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
 										AGGSPLIT_INITIAL_SERIAL,
 										agg_info->group_clauses,
 										NIL,
+										NIL,
+										&agg_costs,
+										dNumPartialGroups);
+
+		add_partial_path(grouped_rel, path);
+	}
+
+	if (can_index && cheapest_total_path != NULL)
+	{
+		Path	   *path;
+
+		/*
+		 * Since the path originates from a non-grouped relation that is
+		 * not aware of eager aggregation, we must ensure that it provides
+		 * the correct input for partial aggregation.
+		 */
+		path = (Path *) create_projection_path(root,
+											   grouped_rel,
+											   cheapest_total_path,
+											   agg_info->agg_input);
+		/*
+		 * qual is NIL because the HAVING clause cannot be evaluated until the
+		 * final value of the aggregate is known.
+		 */
+		path = (Path *) create_agg_path(root,
+										grouped_rel,
+										path,
+										agg_info->target,
+										AGG_INDEX,
+										AGGSPLIT_INITIAL_SERIAL,
+										agg_info->group_clauses,
+										NIL,
+										group_pathkeys,
+										&agg_costs,
+										dNumGroups);
+
+		add_path(grouped_rel, path);
+	}
+
+	if (can_index && cheapest_partial_path != NULL)
+	{
+		Path	   *path;
+
+		/*
+		 * Since the path originates from a non-grouped relation that is not
+		 * aware of eager aggregation, we must ensure that it provides the
+		 * correct input for partial aggregation.
+		 */
+		path = (Path *) create_projection_path(root,
+											   grouped_rel,
+											   cheapest_partial_path,
+											   agg_info->agg_input);
+
+		/*
+		 * qual is NIL because the HAVING clause cannot be evaluated until the
+		 * final value of the aggregate is known.
+		 */
+		path = (Path *) create_agg_path(root,
+										grouped_rel,
+										path,
+										agg_info->target,
+										AGG_INDEX,
+										AGGSPLIT_INITIAL_SERIAL,
+										agg_info->group_clauses,
+										NIL,
+										group_pathkeys,
 										&agg_costs,
 										dNumPartialGroups);
 
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index cfd2f3ff3a9..5598943d9f9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3888,6 +3888,7 @@ create_grouping_paths(PlannerInfo *root,
 		if (   gd == NULL
 			&& root->numOrderedAggs == 0
 			&& parse->groupClause != NIL
+			&& parse->groupingSets == NIL
 			&& grouping_is_sortable(root->processed_groupClause)
 			&& grouping_is_hashable(root->processed_groupClause))
 			flags |= GROUPING_CAN_USE_INDEX;
@@ -5030,6 +5031,7 @@ create_partial_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 										 AGGSPLIT_SIMPLE,
 										 root->processed_distinctClause,
 										 NIL,
+										 NIL,
 										 NULL,
 										 numDistinctRows));
 	}
@@ -5238,6 +5240,7 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 								 AGGSPLIT_SIMPLE,
 								 root->processed_distinctClause,
 								 NIL,
+								 NIL,
 								 NULL,
 								 numDistinctRows));
 	}
@@ -7209,6 +7212,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 											 AGGSPLIT_SIMPLE,
 											 info->clauses,
 											 havingQual,
+											 NIL,
 											 agg_costs,
 											 dNumGroups));
 				}
@@ -7280,6 +7284,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 												 AGGSPLIT_FINAL_DESERIAL,
 												 info->clauses,
 												 havingQual,
+												 NIL,
 												 agg_final_costs,
 												 dNumFinalGroups));
 					else
@@ -7321,6 +7326,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 									 AGGSPLIT_SIMPLE,
 									 root->processed_groupClause,
 									 havingQual,
+									 NIL,
 									 agg_costs,
 									 dNumGroups));
 		}
@@ -7340,6 +7346,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 									 AGGSPLIT_FINAL_DESERIAL,
 									 root->processed_groupClause,
 									 havingQual,
+									 NIL,
 									 agg_final_costs,
 									 dNumFinalGroups));
 		}
@@ -7347,10 +7354,10 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 
 	if (can_index)
 	{
-		/* 
-		 * Generate IndexAgg path.
-		 */
-		Assert(!parse->groupingSets);
+		List *pathkeys = make_pathkeys_for_sortclauses(root,
+													   root->processed_groupClause,
+													   root->processed_tlist);
+
 		add_path(grouped_rel, (Path *)
 				 create_agg_path(root,
 								 grouped_rel,
@@ -7360,8 +7367,29 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 								 AGGSPLIT_SIMPLE,
 								 root->processed_groupClause,
 								 havingQual,
+								 pathkeys,
 								 agg_costs,
 								 dNumGroups));
+		
+		/*
+		 * Instead of operating directly on the input relation, we can
+		 * consider finalizing a partially aggregated path.
+		 */
+		if (partially_grouped_rel != NULL)
+		{
+			add_path(grouped_rel, (Path *)
+					 create_agg_path(root,
+									 grouped_rel,
+									 cheapest_partially_grouped_path,
+									 grouped_rel->reltarget,
+									 AGG_INDEX,
+									 AGGSPLIT_FINAL_DESERIAL,
+									 root->processed_groupClause,
+									 havingQual,
+									 pathkeys,
+									 agg_final_costs,
+									 dNumFinalGroups));
+		}
 	}
 
 	/*
@@ -7410,6 +7438,7 @@ create_partial_grouping_paths(PlannerInfo *root,
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+	bool		can_index = (extra->flags & GROUPING_CAN_USE_INDEX) != 0;
 
 	/*
 	 * Check whether any partially aggregated paths have been generated
@@ -7561,6 +7590,7 @@ create_partial_grouping_paths(PlannerInfo *root,
 											 AGGSPLIT_INITIAL_SERIAL,
 											 info->clauses,
 											 NIL,
+											 NIL,
 											 agg_partial_costs,
 											 dNumPartialGroups));
 				else
@@ -7619,6 +7649,7 @@ create_partial_grouping_paths(PlannerInfo *root,
 													 AGGSPLIT_INITIAL_SERIAL,
 													 info->clauses,
 													 NIL,
+													 NIL,
 													 agg_partial_costs,
 													 dNumPartialPartialGroups));
 				else
@@ -7650,6 +7681,7 @@ create_partial_grouping_paths(PlannerInfo *root,
 								 AGGSPLIT_INITIAL_SERIAL,
 								 root->processed_groupClause,
 								 NIL,
+								 NIL,
 								 agg_partial_costs,
 								 dNumPartialGroups));
 	}
@@ -7668,6 +7700,62 @@ create_partial_grouping_paths(PlannerInfo *root,
 										 AGGSPLIT_INITIAL_SERIAL,
 										 root->processed_groupClause,
 										 NIL,
+										 NIL,
+										 agg_partial_costs,
+										 dNumPartialPartialGroups));
+	}
+	
+	/*
+	 * Add a partially-grouped IndexAgg Path where possible
+	 */
+	if (can_index && cheapest_total_path != NULL)
+	{
+		List *pathkeys;
+
+		/* This should have been checked previously */
+		Assert(parse->hasAggs || parse->groupClause);
+		
+		pathkeys = make_pathkeys_for_sortclauses(root,
+												 root->processed_groupClause,
+												 root->processed_tlist);
+
+		add_path(partially_grouped_rel, (Path *)
+				 create_agg_path(root,
+								 partially_grouped_rel,
+								 cheapest_total_path,
+								 partially_grouped_rel->reltarget,
+								 AGG_INDEX,
+								 AGGSPLIT_INITIAL_SERIAL,
+								 root->processed_groupClause,
+								 NIL,
+								 pathkeys,
+								 agg_partial_costs,
+								 dNumPartialGroups));
+	}
+
+	/*
+	 * Now add a partially-grouped IndexAgg partial Path where possible
+	 */
+	if (can_index && cheapest_partial_path != NULL)
+	{
+		List *pathkeys;
+
+		/* This should have been checked previously */
+		Assert(parse->hasAggs || parse->groupClause);
+		
+		pathkeys = make_pathkeys_for_sortclauses(root,
+												 root->processed_groupClause,
+												 root->processed_tlist);
+		add_partial_path(partially_grouped_rel, (Path *)
+						  create_agg_path(root,
+										 partially_grouped_rel,
+										 cheapest_partial_path,
+										 partially_grouped_rel->reltarget,
+										 AGG_INDEX,
+										 AGGSPLIT_INITIAL_SERIAL,
+										 root->processed_groupClause,
+										 NIL,
+										 pathkeys,
 										 agg_partial_costs,
 										 dNumPartialPartialGroups));
 	}
@@ -8829,6 +8917,7 @@ create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
 										AGGSPLIT_SIMPLE,
 										groupClause,
 										NIL,
+										NIL,
 										NULL,
 										unique_rel->rows);
 
@@ -8971,6 +9060,7 @@ create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
 										AGGSPLIT_SIMPLE,
 										groupClause,
 										NIL,
+										NIL,
 										NULL,
 										partial_unique_rel->rows);
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index a01b02f3a7b..de6a1558044 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -949,6 +949,7 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 											AGGSPLIT_SIMPLE,
 											groupList,
 											NIL,
+											NIL,
 											NULL,
 											dNumGroups);
 			add_path(result_rel, path);
@@ -965,6 +966,7 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 												AGGSPLIT_SIMPLE,
 												groupList,
 												NIL,
+												NIL,
 												NULL,
 												dNumGroups);
 				add_path(result_rel, path);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 2bac26055a7..646762be43b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2988,6 +2988,7 @@ create_unique_path(PlannerInfo *root,
  * 'aggsplit' is the Agg node's aggregate-splitting mode
  * 'groupClause' is a list of SortGroupClause's representing the grouping
  * 'qual' is the HAVING quals if any
+ * 'pathkeys' for AGG_INDEX must be a list of PathKey used by this agg node
  * 'aggcosts' contains cost info about the aggregate functions to be computed
  * 'numGroups' is the estimated number of groups (1 if not grouping)
  */
@@ -3000,6 +3001,7 @@ create_agg_path(PlannerInfo *root,
 				AggSplit aggsplit,
 				List *groupClause,
 				List *qual,
+				List *pathkeys,
 				const AggClauseCosts *aggcosts,
 				double numGroups)
 {
@@ -3033,11 +3035,17 @@ create_agg_path(PlannerInfo *root,
 	else if (aggstrategy == AGG_INDEX)
 	{
 		/* 
-		 * When using index aggregation all grouping columns will be used as
-		 * comparator keys, so output is always sorted.
+		 * For IndexAgg we also must know used ordering just like for GroupAgg,
+		 * but for the latter this information is passed by child node, i.e.
+		 * Sort. But here we can not use make_pathkeys_for_sortclauses, because
+		 * in case of partial aggregates the node will contain different target
+		 * list and sortgroupref indexes, so this function will not find required
+		 * entries. So caller must build pathkeys for us.
+		 * 
+		 * NOTE: pathkeys CAN be NIL, i.e. if planner decided that all values
+		 * are same constant.
 		 */
-		pathnode->path.pathkeys = make_pathkeys_for_sortclauses(root, groupClause,
-																root->processed_tlist);
+		pathnode->path.pathkeys = pathkeys;
 	}
 	else
 		pathnode->path.pathkeys = NIL;	/* output is unordered */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 6b010f0b1a5..a2aad4ecba7 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -235,6 +235,7 @@ extern AggPath *create_agg_path(PlannerInfo *root,
 								AggSplit aggsplit,
 								List *groupClause,
 								List *qual,
+								List *pathkeys,
 								const AggClauseCosts *aggcosts,
 								double numGroups);
 extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
-- 
2.43.0

v3-0005-fix-tests-for-IndexAggregate.patchtext/x-patch; charset=UTF-8; name=v3-0005-fix-tests-for-IndexAggregate.patchDownload

From e3ac8e4b5a635dd9e6563465002fa41c0834ace4 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Thu, 11 Dec 2025 16:06:01 +0300
Subject: [PATCH v3 5/5] fix tests for IndexAggregate

After adding IndexAggregate node some test output changed and tests
broke. This patch updates expected output.

Also it adds some IndexAggregate specific tests into aggregates.sql and
partition_aggregate.sql.
---
 .../postgres_fdw/expected/postgres_fdw.out    |  39 +-
 src/test/regress/expected/aggregates.out      | 291 +++++++++-
 .../regress/expected/collate.icu.utf8.out     |  16 +-
 src/test/regress/expected/eager_aggregate.out | 539 ++++++++++--------
 src/test/regress/expected/join.out            |  31 +-
 .../regress/expected/partition_aggregate.out  | 361 ++++++++----
 src/test/regress/expected/select_parallel.out |  27 +-
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/aggregates.sql           | 147 ++++-
 src/test/regress/sql/eager_aggregate.sql      |  41 ++
 src/test/regress/sql/join.sql                 |   2 +
 src/test/regress/sql/partition_aggregate.sql  |  31 +-
 src/test/regress/sql/select_parallel.sql      |   3 +
 13 files changed, 1096 insertions(+), 435 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 48e3185b227..de7227a6040 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3701,33 +3701,30 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
 -- Subquery in FROM clause having aggregate
 explain (verbose, costs off)
 select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
-                                       QUERY PLAN                                        
------------------------------------------------------------------------------------------
+                                    QUERY PLAN                                     
+-----------------------------------------------------------------------------------
  Sort
    Output: (count(*)), (sum(ft1_1.c1))
    Sort Key: (count(*)), (sum(ft1_1.c1))
-   ->  Finalize GroupAggregate
+   ->  Finalize IndexAggregate
          Output: count(*), (sum(ft1_1.c1))
          Group Key: (sum(ft1_1.c1))
-         ->  Sort
+         ->  Hash Join
                Output: (sum(ft1_1.c1)), (PARTIAL count(*))
-               Sort Key: (sum(ft1_1.c1))
-               ->  Hash Join
-                     Output: (sum(ft1_1.c1)), (PARTIAL count(*))
-                     Hash Cond: (ft1_1.c2 = ft1.c2)
-                     ->  Foreign Scan
-                           Output: ft1_1.c2, (sum(ft1_1.c1))
-                           Relations: Aggregate on (public.ft1 ft1_1)
-                           Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-                     ->  Hash
-                           Output: ft1.c2, (PARTIAL count(*))
-                           ->  Partial HashAggregate
-                                 Output: ft1.c2, PARTIAL count(*)
-                                 Group Key: ft1.c2
-                                 ->  Foreign Scan on public.ft1
-                                       Output: ft1.c2
-                                       Remote SQL: SELECT c2 FROM "S 1"."T 1"
-(24 rows)
+               Hash Cond: (ft1_1.c2 = ft1.c2)
+               ->  Foreign Scan
+                     Output: ft1_1.c2, (sum(ft1_1.c1))
+                     Relations: Aggregate on (public.ft1 ft1_1)
+                     Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+               ->  Hash
+                     Output: ft1.c2, (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: ft1.c2, PARTIAL count(*)
+                           Group Key: ft1.c2
+                           ->  Foreign Scan on public.ft1
+                                 Output: ft1.c2
+                                 Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(21 rows)
 
 select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
  count |   b   
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index cae8e7bca31..afe01f5da85 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1533,7 +1533,7 @@ explain (costs off) select * from t1 group by a,b,c,d;
 explain (costs off) select * from only t1 group by a,b,c,d;
       QUERY PLAN      
 ----------------------
- HashAggregate
+ IndexAggregate
    Group Key: a, b
    ->  Seq Scan on t1
 (3 rows)
@@ -3270,6 +3270,7 @@ FROM generate_series(1, 100) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x, y);
 ANALYZE btg;
 SET enable_hashagg = off;
+SET enable_indexagg = off;
 SET enable_seqscan = off;
 -- Utilize the ordering of index scan to avoid a Sort operation
 EXPLAIN (COSTS OFF)
@@ -3707,10 +3708,242 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
  ba       |    0 |     1
 (2 rows)
 
+ 
+--
+-- Index Aggregation tests
+--
+set enable_hashagg = false;
+set enable_sort = false;
+set enable_indexagg = true;
+set enable_indexscan = false;
+-- require ordered output
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: unique1, (sum(two))
+   ->  IndexAggregate
+         Output: unique1, sum(two)
+         Group Key: tenk1.unique1
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+ unique1 | sum 
+---------+-----
+       0 |   0
+       1 |   1
+       2 |   0
+       3 |   1
+       4 |   0
+       5 |   1
+       6 |   0
+       7 |   1
+       8 |   0
+       9 |   1
+(10 rows)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: even, (sum(two))
+   ->  IndexAggregate
+         Output: even, sum(two)
+         Group Key: tenk1.even
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+ even | sum 
+------+-----
+    1 |   0
+    3 | 100
+    5 |   0
+    7 | 100
+    9 |   0
+   11 | 100
+   13 |   0
+   15 | 100
+   17 |   0
+   19 | 100
+(10 rows)
+
+-- multiple grouping columns
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: even, odd, (sum(unique1))
+   ->  IndexAggregate
+         Output: even, odd, sum(unique1)
+         Group Key: tenk1.even, tenk1.odd
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+ even | odd |  sum   
+------+-----+--------
+    1 |   0 | 495000
+    3 |   2 | 495100
+    5 |   4 | 495200
+    7 |   6 | 495300
+    9 |   8 | 495400
+   11 |  10 | 495500
+   13 |  12 | 495600
+   15 |  14 | 495700
+   17 |  16 | 495800
+   19 |  18 | 495900
+(10 rows)
+
+-- mixing columns between group by and order by
+begin;
+create temp table tmp(x int, y int);
+insert into tmp values (1, 8), (2, 7), (3, 6), (4, 5);
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+          QUERY PLAN           
+-------------------------------
+ IndexAggregate
+   Output: x, y, sum(x)
+   Group Key: tmp.x, tmp.y
+   ->  Seq Scan on pg_temp.tmp
+         Output: x, y
+(5 rows)
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+ x | y | sum 
+---+---+-----
+ 1 | 8 |   1
+ 2 | 7 |   2
+ 3 | 6 |   3
+ 4 | 5 |   4
+(4 rows)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+          QUERY PLAN           
+-------------------------------
+ IndexAggregate
+   Output: x, y, sum(x)
+   Group Key: tmp.y, tmp.x
+   ->  Seq Scan on pg_temp.tmp
+         Output: x, y
+(5 rows)
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+ x | y | sum 
+---+---+-----
+ 4 | 5 |   4
+ 3 | 6 |   3
+ 2 | 7 |   2
+ 1 | 8 |   1
+(4 rows)
+
+--
+-- Index Aggregation Spill tests
+--
+set enable_indexagg = true;
+set enable_sort=false;
+set enable_hashagg = false;
+set work_mem='64kB';
+select unique1, count(*), sum(twothousand) from tenk1
+group by unique1
+having sum(fivethous) > 4975
+order by sum(twothousand);
+ unique1 | count | sum  
+---------+-------+------
+    4976 |     1 |  976
+    4977 |     1 |  977
+    4978 |     1 |  978
+    4979 |     1 |  979
+    4980 |     1 |  980
+    4981 |     1 |  981
+    4982 |     1 |  982
+    4983 |     1 |  983
+    4984 |     1 |  984
+    4985 |     1 |  985
+    4986 |     1 |  986
+    4987 |     1 |  987
+    4988 |     1 |  988
+    4989 |     1 |  989
+    4990 |     1 |  990
+    4991 |     1 |  991
+    4992 |     1 |  992
+    4993 |     1 |  993
+    4994 |     1 |  994
+    4995 |     1 |  995
+    4996 |     1 |  996
+    4997 |     1 |  997
+    4998 |     1 |  998
+    4999 |     1 |  999
+    9976 |     1 | 1976
+    9977 |     1 | 1977
+    9978 |     1 | 1978
+    9979 |     1 | 1979
+    9980 |     1 | 1980
+    9981 |     1 | 1981
+    9982 |     1 | 1982
+    9983 |     1 | 1983
+    9984 |     1 | 1984
+    9985 |     1 | 1985
+    9986 |     1 | 1986
+    9987 |     1 | 1987
+    9988 |     1 | 1988
+    9989 |     1 | 1989
+    9990 |     1 | 1990
+    9991 |     1 | 1991
+    9992 |     1 | 1992
+    9993 |     1 | 1993
+    9994 |     1 | 1994
+    9995 |     1 | 1995
+    9996 |     1 | 1996
+    9997 |     1 | 1997
+    9998 |     1 | 1998
+    9999 |     1 | 1999
+(48 rows)
+
+set work_mem to default;
+set enable_sort to default;
+set enable_hashagg to default;
+set enable_indexagg to default;
 --
 -- Hash Aggregation Spill tests
 --
 set enable_sort=false;
+set enable_indexagg = false;
 set work_mem='64kB';
 select unique1, count(*), sum(twothousand) from tenk1
 group by unique1
@@ -3783,6 +4016,7 @@ select g from generate_series(0, 19999) g;
 analyze agg_data_20k;
 -- Produce results with sorting.
 set enable_hashagg = false;
+set enable_indexagg = false;
 set jit_above_cost = 0;
 explain (costs off)
 select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
@@ -3852,31 +4086,74 @@ select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
   from agg_data_2k group by g/2;
 set enable_sort = true;
 set work_mem to default;
+-- Produce results with index aggregation
+set enable_sort = false;
+set enable_hashagg = false;
+set enable_indexagg = true;
+set jit_above_cost = 0;
+explain (costs off)
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+           QUERY PLAN           
+--------------------------------
+ IndexAggregate
+   Group Key: (g % 10000)
+   ->  Seq Scan on agg_data_20k
+(3 rows)
+
+create table agg_index_1 as
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+create table agg_index_2 as
+select * from
+  (values (100), (300), (500)) as r(a),
+  lateral (
+    select (g/2)::numeric as c1,
+           array_agg(g::numeric) as c2,
+	   count(*) as c3
+    from agg_data_2k
+    where g < r.a
+    group by g/2) as s;
+set jit_above_cost to default;
+create table agg_index_3 as
+select (g/2)::numeric as c1, sum(7::int4) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+create table agg_index_4 as
+select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
 -- Compare group aggregation results to hash aggregation results
 (select * from agg_hash_1 except select * from agg_group_1)
   union all
-(select * from agg_group_1 except select * from agg_hash_1);
+(select * from agg_group_1 except select * from agg_hash_1)
+  union all
+(select * from agg_index_1 except select * from agg_group_1);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
 
 (select * from agg_hash_2 except select * from agg_group_2)
   union all
-(select * from agg_group_2 except select * from agg_hash_2);
+(select * from agg_group_2 except select * from agg_hash_2)
+  union all
+(select * from agg_index_2 except select * from agg_group_2);
  a | c1 | c2 | c3 
 ---+----+----+----
 (0 rows)
 
 (select * from agg_hash_3 except select * from agg_group_3)
   union all
-(select * from agg_group_3 except select * from agg_hash_3);
+(select * from agg_group_3 except select * from agg_hash_3)
+  union all
+(select * from agg_index_3 except select * from agg_group_3);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
 
 (select * from agg_hash_4 except select * from agg_group_4)
   union all
-(select * from agg_group_4 except select * from agg_hash_4);
+(select * from agg_group_4 except select * from agg_hash_4)
+  union all
+(select * from agg_index_4 except select * from agg_group_4);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
@@ -3889,3 +4166,7 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_index_1;
+drop table agg_index_2;
+drop table agg_index_3;
+drop table agg_index_4;
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 8023014fe63..c62e312175c 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2395,8 +2395,8 @@ SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c col
 --------------------------------------------------------------
  Sort
    Sort Key: (upper(pagg_tab3.c)) COLLATE case_insensitive
-   ->  Finalize HashAggregate
-         Group Key: pagg_tab3.c
+   ->  Finalize IndexAggregate
+         Group Key: pagg_tab3.c COLLATE case_insensitive
          ->  Append
                ->  Partial HashAggregate
                      Group Key: pagg_tab3.c
@@ -2613,20 +2613,20 @@ INSERT INTO pagg_tab6 (b, c) SELECT substr('cdCD', (i % 4) + 1 , 1), substr('cdC
 ANALYZE pagg_tab6;
 EXPLAIN (COSTS OFF)
 SELECT t1.c, count(t2.c) FROM pagg_tab5 t1 JOIN pagg_tab6 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
-                      QUERY PLAN                       
--------------------------------------------------------
+                        QUERY PLAN                        
+----------------------------------------------------------
  Sort
    Sort Key: t1.c COLLATE "C"
    ->  Append
-         ->  HashAggregate
-               Group Key: t1.c
+         ->  IndexAggregate
+               Group Key: t1.c COLLATE case_insensitive
                ->  Nested Loop
                      Join Filter: (t1.c = t2.c)
                      ->  Seq Scan on pagg_tab6_p1 t2
                            Filter: (c = b)
                      ->  Seq Scan on pagg_tab5_p1 t1
-         ->  HashAggregate
-               Group Key: t1_1.c
+         ->  IndexAggregate
+               Group Key: t1_1.c COLLATE case_insensitive
                ->  Nested Loop
                      Join Filter: (t1_1.c = t2_1.c)
                      ->  Seq Scan on pagg_tab6_p2 t2_1
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
index 5ac966186f7..0d4468fa686 100644
--- a/src/test/regress/expected/eager_aggregate.out
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -21,27 +21,24 @@ SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
   JOIN eager_agg_t2 t2 ON t1.b = t2.b
 GROUP BY t1.a ORDER BY t1.a;
-                            QUERY PLAN                            
-------------------------------------------------------------------
- Finalize GroupAggregate
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize IndexAggregate
    Output: t1.a, avg(t2.c)
    Group Key: t1.a
-   ->  Sort
+   ->  Hash Join
          Output: t1.a, (PARTIAL avg(t2.c))
-         Sort Key: t1.a
-         ->  Hash Join
-               Output: t1.a, (PARTIAL avg(t2.c))
-               Hash Cond: (t1.b = t2.b)
-               ->  Seq Scan on public.eager_agg_t1 t1
-                     Output: t1.a, t1.b, t1.c
-               ->  Hash
-                     Output: t2.b, (PARTIAL avg(t2.c))
-                     ->  Partial HashAggregate
-                           Output: t2.b, PARTIAL avg(t2.c)
-                           Group Key: t2.b
-                           ->  Seq Scan on public.eager_agg_t2 t2
-                                 Output: t2.a, t2.b, t2.c
-(18 rows)
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg(t2.c))
+               ->  Partial HashAggregate
+                     Output: t2.b, PARTIAL avg(t2.c)
+                     Group Key: t2.b
+                     ->  Seq Scan on public.eager_agg_t2 t2
+                           Output: t2.a, t2.b, t2.c
+(15 rows)
 
 SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
@@ -62,6 +59,7 @@ GROUP BY t1.a ORDER BY t1.a;
 
 -- Produce results with sorting aggregation
 SET enable_hashagg TO off;
+SET enable_indexagg TO off;
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
@@ -110,6 +108,53 @@ GROUP BY t1.a ORDER BY t1.a;
 (9 rows)
 
 RESET enable_hashagg;
+RESET enable_indexagg;
+-- Produce results with index aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize IndexAggregate
+   Output: t1.a, avg(t2.c)
+   Group Key: t1.a
+   ->  Hash Join
+         Output: t1.a, (PARTIAL avg(t2.c))
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg(t2.c))
+               ->  Partial IndexAggregate
+                     Output: t2.b, PARTIAL avg(t2.c)
+                     Group Key: t2.b
+                     ->  Seq Scan on public.eager_agg_t2 t2
+                           Output: t2.a, t2.b, t2.c
+(15 rows)
+
+SELECT t1.a, avg(t2.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg 
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+RESET enable_sort;
 --
 -- Test eager aggregation over join rel
 --
@@ -121,34 +166,31 @@ SELECT t1.a, avg(t2.c + t3.c)
   JOIN eager_agg_t2 t2 ON t1.b = t2.b
   JOIN eager_agg_t3 t3 ON t2.a = t3.a
 GROUP BY t1.a ORDER BY t1.a;
-                                  QUERY PLAN                                  
-------------------------------------------------------------------------------
- Finalize GroupAggregate
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Finalize IndexAggregate
    Output: t1.a, avg((t2.c + t3.c))
    Group Key: t1.a
-   ->  Sort
+   ->  Hash Join
          Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
-         Sort Key: t1.a
-         ->  Hash Join
-               Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
-               Hash Cond: (t1.b = t2.b)
-               ->  Seq Scan on public.eager_agg_t1 t1
-                     Output: t1.a, t1.b, t1.c
-               ->  Hash
-                     Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
-                     ->  Partial HashAggregate
-                           Output: t2.b, PARTIAL avg((t2.c + t3.c))
-                           Group Key: t2.b
-                           ->  Hash Join
-                                 Output: t2.c, t2.b, t3.c
-                                 Hash Cond: (t3.a = t2.a)
-                                 ->  Seq Scan on public.eager_agg_t3 t3
-                                       Output: t3.a, t3.b, t3.c
-                                 ->  Hash
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+               ->  Partial HashAggregate
+                     Output: t2.b, PARTIAL avg((t2.c + t3.c))
+                     Group Key: t2.b
+                     ->  Hash Join
+                           Output: t2.c, t2.b, t3.c
+                           Hash Cond: (t3.a = t2.a)
+                           ->  Seq Scan on public.eager_agg_t3 t3
+                                 Output: t3.a, t3.b, t3.c
+                           ->  Hash
+                                 Output: t2.c, t2.b, t2.a
+                                 ->  Seq Scan on public.eager_agg_t2 t2
                                        Output: t2.c, t2.b, t2.a
-                                       ->  Seq Scan on public.eager_agg_t2 t2
-                                             Output: t2.c, t2.b, t2.a
-(25 rows)
+(22 rows)
 
 SELECT t1.a, avg(t2.c + t3.c)
   FROM eager_agg_t1 t1
@@ -170,6 +212,7 @@ GROUP BY t1.a ORDER BY t1.a;
 
 -- Produce results with sorting aggregation
 SET enable_hashagg TO off;
+SET enable_indexagg TO off;
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a, avg(t2.c + t3.c)
   FROM eager_agg_t1 t1
@@ -227,6 +270,62 @@ GROUP BY t1.a ORDER BY t1.a;
 (9 rows)
 
 RESET enable_hashagg;
+RESET enable_indexagg;
+-- Produce results with index aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+  JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Finalize IndexAggregate
+   Output: t1.a, avg((t2.c + t3.c))
+   Group Key: t1.a
+   ->  Hash Join
+         Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+               ->  Partial IndexAggregate
+                     Output: t2.b, PARTIAL avg((t2.c + t3.c))
+                     Group Key: t2.b
+                     ->  Hash Join
+                           Output: t2.c, t2.b, t3.c
+                           Hash Cond: (t3.a = t2.a)
+                           ->  Seq Scan on public.eager_agg_t3 t3
+                                 Output: t3.a, t3.b, t3.c
+                           ->  Hash
+                                 Output: t2.c, t2.b, t2.a
+                                 ->  Seq Scan on public.eager_agg_t2 t2
+                                       Output: t2.c, t2.b, t2.a
+(22 rows)
+
+SELECT t1.a, avg(t2.c + t3.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+  JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg 
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+RESET enable_sort;
 --
 -- Test that eager aggregation works for outer join
 --
@@ -236,27 +335,24 @@ SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
   RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b
 GROUP BY t1.a ORDER BY t1.a;
-                            QUERY PLAN                            
-------------------------------------------------------------------
- Finalize GroupAggregate
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize IndexAggregate
    Output: t1.a, avg(t2.c)
    Group Key: t1.a
-   ->  Sort
+   ->  Hash Right Join
          Output: t1.a, (PARTIAL avg(t2.c))
-         Sort Key: t1.a
-         ->  Hash Right Join
-               Output: t1.a, (PARTIAL avg(t2.c))
-               Hash Cond: (t1.b = t2.b)
-               ->  Seq Scan on public.eager_agg_t1 t1
-                     Output: t1.a, t1.b, t1.c
-               ->  Hash
-                     Output: t2.b, (PARTIAL avg(t2.c))
-                     ->  Partial HashAggregate
-                           Output: t2.b, PARTIAL avg(t2.c)
-                           Group Key: t2.b
-                           ->  Seq Scan on public.eager_agg_t2 t2
-                                 Output: t2.a, t2.b, t2.c
-(18 rows)
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg(t2.c))
+               ->  Partial HashAggregate
+                     Output: t2.b, PARTIAL avg(t2.c)
+                     Group Key: t2.b
+                     ->  Seq Scan on public.eager_agg_t2 t2
+                           Output: t2.a, t2.b, t2.c
+(15 rows)
 
 SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
@@ -331,30 +427,27 @@ SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
   JOIN eager_agg_t2 t2 ON t1.b = t2.b
 GROUP BY t1.a ORDER BY t1.a;
-                                   QUERY PLAN                                    
----------------------------------------------------------------------------------
- Finalize GroupAggregate
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize IndexAggregate
    Output: t1.a, avg(t2.c)
    Group Key: t1.a
-   ->  Gather Merge
+   ->  Gather
          Output: t1.a, (PARTIAL avg(t2.c))
          Workers Planned: 2
-         ->  Sort
+         ->  Parallel Hash Join
                Output: t1.a, (PARTIAL avg(t2.c))
-               Sort Key: t1.a
-               ->  Parallel Hash Join
-                     Output: t1.a, (PARTIAL avg(t2.c))
-                     Hash Cond: (t1.b = t2.b)
-                     ->  Parallel Seq Scan on public.eager_agg_t1 t1
-                           Output: t1.a, t1.b, t1.c
-                     ->  Parallel Hash
-                           Output: t2.b, (PARTIAL avg(t2.c))
-                           ->  Partial HashAggregate
-                                 Output: t2.b, PARTIAL avg(t2.c)
-                                 Group Key: t2.b
-                                 ->  Parallel Seq Scan on public.eager_agg_t2 t2
-                                       Output: t2.a, t2.b, t2.c
-(21 rows)
+               Hash Cond: (t1.b = t2.b)
+               ->  Parallel Seq Scan on public.eager_agg_t1 t1
+                     Output: t1.a, t1.b, t1.c
+               ->  Parallel Hash
+                     Output: t2.b, (PARTIAL avg(t2.c))
+                     ->  Partial HashAggregate
+                           Output: t2.b, PARTIAL avg(t2.c)
+                           Group Key: t2.b
+                           ->  Parallel Seq Scan on public.eager_agg_t2 t2
+                                 Output: t2.a, t2.b, t2.c
+(18 rows)
 
 SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
@@ -387,27 +480,24 @@ SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
   JOIN eager_agg_t2 t2 ON t1.b = t2.b
 GROUP BY t1.a ORDER BY t1.a;
-                            QUERY PLAN                            
-------------------------------------------------------------------
- Finalize GroupAggregate
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize IndexAggregate
    Output: t1.a, avg(t2.c)
    Group Key: t1.a
-   ->  Sort
+   ->  Hash Join
          Output: t1.a, (PARTIAL avg(t2.c))
-         Sort Key: t1.a
-         ->  Hash Join
-               Output: t1.a, (PARTIAL avg(t2.c))
-               Hash Cond: (t1.b = t2.b)
-               ->  Seq Scan on public.eager_agg_t1 t1
-                     Output: t1.a, t1.b, t1.c
-               ->  Hash
-                     Output: t2.b, (PARTIAL avg(t2.c))
-                     ->  Partial HashAggregate
-                           Output: t2.b, PARTIAL avg(t2.c)
-                           Group Key: t2.b
-                           ->  Seq Scan on public.eager_agg_t2 t2
-                                 Output: t2.a, t2.b, t2.c
-(18 rows)
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg(t2.c))
+               ->  Partial HashAggregate
+                     Output: t2.b, PARTIAL avg(t2.c)
+                     Group Key: t2.b
+                     ->  Seq Scan on public.eager_agg_t2 t2
+                           Output: t2.a, t2.b, t2.c
+(15 rows)
 
 SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
@@ -696,79 +786,77 @@ SELECT t1.x, sum(t2.y + t3.y)
   JOIN eager_agg_tab1 t2 ON t1.x = t2.x
   JOIN eager_agg_tab1 t3 ON t2.x = t3.x
 GROUP BY t1.x ORDER BY t1.x;
-                                        QUERY PLAN                                         
--------------------------------------------------------------------------------------------
- Sort
-   Output: t1.x, (sum((t2.y + t3.y)))
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
+ Merge Append
    Sort Key: t1.x
-   ->  Append
-         ->  Finalize HashAggregate
-               Output: t1.x, sum((t2.y + t3.y))
-               Group Key: t1.x
-               ->  Hash Join
-                     Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
-                     Hash Cond: (t1.x = t2.x)
-                     ->  Seq Scan on public.eager_agg_tab1_p1 t1
-                           Output: t1.x
-                     ->  Hash
-                           Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
-                           ->  Partial HashAggregate
-                                 Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
-                                 Group Key: t2.x
-                                 ->  Hash Join
-                                       Output: t2.y, t2.x, t3.y, t3.x
-                                       Hash Cond: (t2.x = t3.x)
-                                       ->  Seq Scan on public.eager_agg_tab1_p1 t2
-                                             Output: t2.y, t2.x
-                                       ->  Hash
+   ->  Finalize IndexAggregate
+         Output: t1.x, sum((t2.y + t3.y))
+         Group Key: t1.x
+         ->  Hash Join
+               Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+               Hash Cond: (t1.x = t2.x)
+               ->  Seq Scan on public.eager_agg_tab1_p1 t1
+                     Output: t1.x
+               ->  Hash
+                     Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+                     ->  Partial HashAggregate
+                           Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+                           Group Key: t2.x
+                           ->  Hash Join
+                                 Output: t2.y, t2.x, t3.y, t3.x
+                                 Hash Cond: (t2.x = t3.x)
+                                 ->  Seq Scan on public.eager_agg_tab1_p1 t2
+                                       Output: t2.y, t2.x
+                                 ->  Hash
+                                       Output: t3.y, t3.x
+                                       ->  Seq Scan on public.eager_agg_tab1_p1 t3
                                              Output: t3.y, t3.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p1 t3
-                                                   Output: t3.y, t3.x
-         ->  Finalize HashAggregate
-               Output: t1_1.x, sum((t2_1.y + t3_1.y))
-               Group Key: t1_1.x
-               ->  Hash Join
-                     Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
-                     Hash Cond: (t1_1.x = t2_1.x)
-                     ->  Seq Scan on public.eager_agg_tab1_p2 t1_1
-                           Output: t1_1.x
-                     ->  Hash
-                           Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
-                           ->  Partial HashAggregate
-                                 Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
-                                 Group Key: t2_1.x
-                                 ->  Hash Join
-                                       Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
-                                       Hash Cond: (t2_1.x = t3_1.x)
-                                       ->  Seq Scan on public.eager_agg_tab1_p2 t2_1
-                                             Output: t2_1.y, t2_1.x
-                                       ->  Hash
+   ->  Finalize IndexAggregate
+         Output: t1_1.x, sum((t2_1.y + t3_1.y))
+         Group Key: t1_1.x
+         ->  Hash Join
+               Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+               Hash Cond: (t1_1.x = t2_1.x)
+               ->  Seq Scan on public.eager_agg_tab1_p2 t1_1
+                     Output: t1_1.x
+               ->  Hash
+                     Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+                     ->  Partial HashAggregate
+                           Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+                           Group Key: t2_1.x
+                           ->  Hash Join
+                                 Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+                                 Hash Cond: (t2_1.x = t3_1.x)
+                                 ->  Seq Scan on public.eager_agg_tab1_p2 t2_1
+                                       Output: t2_1.y, t2_1.x
+                                 ->  Hash
+                                       Output: t3_1.y, t3_1.x
+                                       ->  Seq Scan on public.eager_agg_tab1_p2 t3_1
                                              Output: t3_1.y, t3_1.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p2 t3_1
-                                                   Output: t3_1.y, t3_1.x
-         ->  Finalize HashAggregate
-               Output: t1_2.x, sum((t2_2.y + t3_2.y))
-               Group Key: t1_2.x
-               ->  Hash Join
-                     Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
-                     Hash Cond: (t1_2.x = t2_2.x)
-                     ->  Seq Scan on public.eager_agg_tab1_p3 t1_2
-                           Output: t1_2.x
-                     ->  Hash
-                           Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
-                           ->  Partial HashAggregate
-                                 Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
-                                 Group Key: t2_2.x
-                                 ->  Hash Join
-                                       Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
-                                       Hash Cond: (t2_2.x = t3_2.x)
-                                       ->  Seq Scan on public.eager_agg_tab1_p3 t2_2
-                                             Output: t2_2.y, t2_2.x
-                                       ->  Hash
+   ->  Finalize IndexAggregate
+         Output: t1_2.x, sum((t2_2.y + t3_2.y))
+         Group Key: t1_2.x
+         ->  Hash Join
+               Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+               Hash Cond: (t1_2.x = t2_2.x)
+               ->  Seq Scan on public.eager_agg_tab1_p3 t1_2
+                     Output: t1_2.x
+               ->  Hash
+                     Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+                     ->  Partial HashAggregate
+                           Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+                           Group Key: t2_2.x
+                           ->  Hash Join
+                                 Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+                                 Hash Cond: (t2_2.x = t3_2.x)
+                                 ->  Seq Scan on public.eager_agg_tab1_p3 t2_2
+                                       Output: t2_2.y, t2_2.x
+                                 ->  Hash
+                                       Output: t3_2.y, t3_2.x
+                                       ->  Seq Scan on public.eager_agg_tab1_p3 t3_2
                                              Output: t3_2.y, t3_2.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p3 t3_2
-                                                   Output: t3_2.y, t3_2.x
-(70 rows)
+(68 rows)
 
 SELECT t1.x, sum(t2.y + t3.y)
   FROM eager_agg_tab1 t1
@@ -803,97 +891,46 @@ SELECT t3.y, sum(t2.y + t3.y)
   JOIN eager_agg_tab1 t2 ON t1.x = t2.x
   JOIN eager_agg_tab1 t3 ON t2.x = t3.x
 GROUP BY t3.y ORDER BY t3.y;
-                                        QUERY PLAN                                         
--------------------------------------------------------------------------------------------
- Finalize GroupAggregate
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
+ Finalize IndexAggregate
    Output: t3.y, sum((t2.y + t3.y))
    Group Key: t3.y
-   ->  Sort
+   ->  Hash Join
          Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
-         Sort Key: t3.y
+         Hash Cond: (t1.x = t2.x)
          ->  Append
-               ->  Hash Join
-                     Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
-                     Hash Cond: (t2.x = t1.x)
-                     ->  Partial GroupAggregate
-                           Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
-                           Group Key: t2.x, t3.y, t3.x
-                           ->  Incremental Sort
-                                 Output: t2.y, t2.x, t3.y, t3.x
-                                 Sort Key: t2.x, t3.y
-                                 Presorted Key: t2.x
-                                 ->  Merge Join
-                                       Output: t2.y, t2.x, t3.y, t3.x
-                                       Merge Cond: (t2.x = t3.x)
-                                       ->  Sort
-                                             Output: t2.y, t2.x
-                                             Sort Key: t2.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p1 t2
-                                                   Output: t2.y, t2.x
-                                       ->  Sort
-                                             Output: t3.y, t3.x
-                                             Sort Key: t3.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p1 t3
-                                                   Output: t3.y, t3.x
-                     ->  Hash
-                           Output: t1.x
-                           ->  Seq Scan on public.eager_agg_tab1_p1 t1
-                                 Output: t1.x
-               ->  Hash Join
-                     Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
-                     Hash Cond: (t2_1.x = t1_1.x)
-                     ->  Partial GroupAggregate
-                           Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
-                           Group Key: t2_1.x, t3_1.y, t3_1.x
-                           ->  Incremental Sort
-                                 Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
-                                 Sort Key: t2_1.x, t3_1.y
-                                 Presorted Key: t2_1.x
-                                 ->  Merge Join
-                                       Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
-                                       Merge Cond: (t2_1.x = t3_1.x)
-                                       ->  Sort
-                                             Output: t2_1.y, t2_1.x
-                                             Sort Key: t2_1.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p2 t2_1
-                                                   Output: t2_1.y, t2_1.x
-                                       ->  Sort
+               ->  Seq Scan on public.eager_agg_tab1_p1 t1_1
+                     Output: t1_1.x
+               ->  Seq Scan on public.eager_agg_tab1_p2 t1_2
+                     Output: t1_2.x
+               ->  Seq Scan on public.eager_agg_tab1_p3 t1_3
+                     Output: t1_3.x
+         ->  Hash
+               Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y)))
+               ->  Partial IndexAggregate
+                     Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+                     Group Key: t2.x, t3.y, t3.x
+                     ->  Hash Join
+                           Output: t2.y, t2.x, t3.y, t3.x
+                           Hash Cond: (t2.x = t3.x)
+                           ->  Append
+                                 ->  Seq Scan on public.eager_agg_tab1_p1 t2_1
+                                       Output: t2_1.y, t2_1.x
+                                 ->  Seq Scan on public.eager_agg_tab1_p2 t2_2
+                                       Output: t2_2.y, t2_2.x
+                                 ->  Seq Scan on public.eager_agg_tab1_p3 t2_3
+                                       Output: t2_3.y, t2_3.x
+                           ->  Hash
+                                 Output: t3.y, t3.x
+                                 ->  Append
+                                       ->  Seq Scan on public.eager_agg_tab1_p1 t3_1
                                              Output: t3_1.y, t3_1.x
-                                             Sort Key: t3_1.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p2 t3_1
-                                                   Output: t3_1.y, t3_1.x
-                     ->  Hash
-                           Output: t1_1.x
-                           ->  Seq Scan on public.eager_agg_tab1_p2 t1_1
-                                 Output: t1_1.x
-               ->  Hash Join
-                     Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
-                     Hash Cond: (t2_2.x = t1_2.x)
-                     ->  Partial GroupAggregate
-                           Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
-                           Group Key: t2_2.x, t3_2.y, t3_2.x
-                           ->  Incremental Sort
-                                 Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
-                                 Sort Key: t2_2.x, t3_2.y
-                                 Presorted Key: t2_2.x
-                                 ->  Merge Join
-                                       Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
-                                       Merge Cond: (t2_2.x = t3_2.x)
-                                       ->  Sort
-                                             Output: t2_2.y, t2_2.x
-                                             Sort Key: t2_2.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p3 t2_2
-                                                   Output: t2_2.y, t2_2.x
-                                       ->  Sort
+                                       ->  Seq Scan on public.eager_agg_tab1_p2 t3_2
                                              Output: t3_2.y, t3_2.x
-                                             Sort Key: t3_2.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p3 t3_2
-                                                   Output: t3_2.y, t3_2.x
-                     ->  Hash
-                           Output: t1_2.x
-                           ->  Seq Scan on public.eager_agg_tab1_p3 t1_2
-                                 Output: t1_2.x
-(88 rows)
+                                       ->  Seq Scan on public.eager_agg_tab1_p3 t3_3
+                                             Output: t3_3.y, t3_3.x
+(37 rows)
 
 SELECT t3.y, sum(t2.y + t3.y)
   FROM eager_agg_tab1 t1
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index edde9e99893..a9a53e4bac7 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2830,6 +2830,7 @@ select count(*) from
 set enable_hashjoin = 0;
 set enable_nestloop = 0;
 set enable_hashagg = 0;
+set enable_indexagg = 0;
 --
 -- Check that we use the pathkeys from a prefix of the group by / order by
 -- clause for the join pathkeys when that prefix covers all join quals.  We
@@ -2857,6 +2858,7 @@ order by x.thousand desc, x.twothousand;
                      ->  Seq Scan on tenk1 x
 (13 rows)
 
+reset enable_indexagg;
 reset enable_hashagg;
 reset enable_nestloop;
 reset enable_hashjoin;
@@ -9534,23 +9536,20 @@ inner join (select distinct id from j3) j3 on j1.id = j3.id;
 explain (verbose, costs off)
 select * from j1
 inner join (select id from j3 group by id) j3 on j1.id = j3.id;
-               QUERY PLAN                
------------------------------------------
+            QUERY PLAN             
+-----------------------------------
  Nested Loop
    Output: j1.id, j3.id
    Inner Unique: true
    Join Filter: (j1.id = j3.id)
-   ->  Group
+   ->  IndexAggregate
          Output: j3.id
          Group Key: j3.id
-         ->  Sort
+         ->  Seq Scan on public.j3
                Output: j3.id
-               Sort Key: j3.id
-               ->  Seq Scan on public.j3
-                     Output: j3.id
    ->  Seq Scan on public.j1
          Output: j1.id
-(14 rows)
+(11 rows)
 
 drop table j1;
 drop table j2;
@@ -9867,16 +9866,14 @@ EXPLAIN (COSTS OFF)
 SELECT 1 FROM group_tbl t1
     LEFT JOIN (SELECT a c1, COALESCE(a, a) c2 FROM group_tbl t2) s ON TRUE
 GROUP BY s.c1, s.c2;
-                   QUERY PLAN                   
-------------------------------------------------
- Group
+                QUERY PLAN                 
+-------------------------------------------
+ IndexAggregate
    Group Key: t2.a, (COALESCE(t2.a, t2.a))
-   ->  Sort
-         Sort Key: t2.a, (COALESCE(t2.a, t2.a))
-         ->  Nested Loop Left Join
-               ->  Seq Scan on group_tbl t1
-               ->  Seq Scan on group_tbl t2
-(7 rows)
+   ->  Nested Loop Left Join
+         ->  Seq Scan on group_tbl t1
+         ->  Seq Scan on group_tbl t2
+(5 rows)
 
 DROP TABLE group_tbl;
 --
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index c30304b99c7..fce941ae1f0 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -150,7 +150,7 @@ EXPLAIN (COSTS OFF)
 SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
              QUERY PLAN             
 ------------------------------------
- HashAggregate
+ IndexAggregate
    Group Key: c
    ->  Result
          Replaces: Scan on pagg_tab
@@ -177,8 +177,9 @@ SELECT c, sum(a) FROM pagg_tab WHERE c = 'x' GROUP BY c;
 ---+-----
 (0 rows)
 
--- Test GroupAggregate paths by disabling hash aggregates.
+-- Test GroupAggregate paths by disabling hash and index aggregates.
 SET enable_hashagg TO false;
+SET enable_indexagg TO false;
 -- When GROUP BY clause matches full aggregation is performed for each partition.
 EXPLAIN (COSTS OFF)
 SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
@@ -370,6 +371,150 @@ SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;
    250
 (1 row)
 
+RESET enable_hashagg;
+RESET enable_indexagg;
+-- Test IndexAggregate paths by disabling hash and group aggregates.
+SET enable_sort TO false;
+SET enable_hashagg TO false;
+-- When GROUP BY clause matches full aggregation is performed for each partition.
+EXPLAIN (COSTS OFF)
+SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Sort
+   Disabled: true
+   Sort Key: pagg_tab.c, (sum(pagg_tab.a)), (avg(pagg_tab.b))
+   ->  Append
+         ->  IndexAggregate
+               Group Key: pagg_tab.c
+               Filter: (avg(pagg_tab.d) < '15'::numeric)
+               ->  Seq Scan on pagg_tab_p1 pagg_tab
+         ->  IndexAggregate
+               Group Key: pagg_tab_1.c
+               Filter: (avg(pagg_tab_1.d) < '15'::numeric)
+               ->  Seq Scan on pagg_tab_p2 pagg_tab_1
+         ->  IndexAggregate
+               Group Key: pagg_tab_2.c
+               Filter: (avg(pagg_tab_2.d) < '15'::numeric)
+               ->  Seq Scan on pagg_tab_p3 pagg_tab_2
+(16 rows)
+
+SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+  c   | sum  |         avg         | count 
+------+------+---------------------+-------
+ 0000 | 2000 | 12.0000000000000000 |   250
+ 0001 | 2250 | 13.0000000000000000 |   250
+ 0002 | 2500 | 14.0000000000000000 |   250
+ 0006 | 2500 | 12.0000000000000000 |   250
+ 0007 | 2750 | 13.0000000000000000 |   250
+ 0008 | 2000 | 14.0000000000000000 |   250
+(6 rows)
+
+-- When GROUP BY clause does not match; top finalize node is required
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Sort
+   Disabled: true
+   Sort Key: pagg_tab.a, (sum(pagg_tab.b)), (avg(pagg_tab.b))
+   ->  Finalize GroupAggregate
+         Group Key: pagg_tab.a
+         Filter: (avg(pagg_tab.d) < '15'::numeric)
+         ->  Merge Append
+               Sort Key: pagg_tab.a
+               ->  Partial IndexAggregate
+                     Group Key: pagg_tab.a
+                     ->  Seq Scan on pagg_tab_p1 pagg_tab
+               ->  Partial IndexAggregate
+                     Group Key: pagg_tab_1.a
+                     ->  Seq Scan on pagg_tab_p2 pagg_tab_1
+               ->  Partial IndexAggregate
+                     Group Key: pagg_tab_2.a
+                     ->  Seq Scan on pagg_tab_p3 pagg_tab_2
+(17 rows)
+
+SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+ a  | sum  |         avg         | count 
+----+------+---------------------+-------
+  0 | 1500 | 10.0000000000000000 |   150
+  1 | 1650 | 11.0000000000000000 |   150
+  2 | 1800 | 12.0000000000000000 |   150
+  3 | 1950 | 13.0000000000000000 |   150
+  4 | 2100 | 14.0000000000000000 |   150
+ 10 | 1500 | 10.0000000000000000 |   150
+ 11 | 1650 | 11.0000000000000000 |   150
+ 12 | 1800 | 12.0000000000000000 |   150
+ 13 | 1950 | 13.0000000000000000 |   150
+ 14 | 2100 | 14.0000000000000000 |   150
+(10 rows)
+
+-- Test partitionwise grouping without any aggregates
+EXPLAIN (COSTS OFF)
+SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
+                   QUERY PLAN                   
+------------------------------------------------
+ Merge Append
+   Sort Key: pagg_tab.c
+   ->  IndexAggregate
+         Group Key: pagg_tab.c
+         ->  Seq Scan on pagg_tab_p1 pagg_tab
+   ->  IndexAggregate
+         Group Key: pagg_tab_1.c
+         ->  Seq Scan on pagg_tab_p2 pagg_tab_1
+   ->  IndexAggregate
+         Group Key: pagg_tab_2.c
+         ->  Seq Scan on pagg_tab_p3 pagg_tab_2
+(11 rows)
+
+SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
+  c   
+------
+ 0000
+ 0001
+ 0002
+ 0003
+ 0004
+ 0005
+ 0006
+ 0007
+ 0008
+ 0009
+ 0010
+ 0011
+(12 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
+                      QUERY PLAN                      
+------------------------------------------------------
+ Group
+   Group Key: pagg_tab.a
+   ->  Merge Append
+         Sort Key: pagg_tab.a
+         ->  Partial IndexAggregate
+               Group Key: pagg_tab.a
+               ->  Seq Scan on pagg_tab_p1 pagg_tab
+                     Filter: (a < 3)
+         ->  Partial IndexAggregate
+               Group Key: pagg_tab_1.a
+               ->  Seq Scan on pagg_tab_p2 pagg_tab_1
+                     Filter: (a < 3)
+         ->  Partial IndexAggregate
+               Group Key: pagg_tab_2.a
+               ->  Seq Scan on pagg_tab_p3 pagg_tab_2
+                     Filter: (a < 3)
+(16 rows)
+
+SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
+ a 
+---
+ 0
+ 1
+ 2
+(3 rows)
+
+RESET enable_sort;
 RESET enable_hashagg;
 -- ROLLUP, partitionwise aggregation does not apply
 EXPLAIN (COSTS OFF)
@@ -554,6 +699,7 @@ SELECT t2.y, sum(t1.y), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2
 -- When GROUP BY clause does not match; partial aggregation is performed for each partition.
 -- Also test GroupAggregate paths by disabling hash aggregates.
 SET enable_hashagg TO false;
+SET enable_indexagg TO false;
 EXPLAIN (COSTS OFF)
 SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y HAVING avg(t1.x) > 10 ORDER BY 1, 2, 3;
                                QUERY PLAN                                
@@ -606,41 +752,40 @@ SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2
 (6 rows)
 
 RESET enable_hashagg;
+RESET enable_indexagg;
 -- Check with LEFT/RIGHT/FULL OUTER JOINs which produces NULL values for
 -- aggregation
 -- LEFT JOIN, should produce partial partitionwise aggregation plan as
 -- GROUP BY is on nullable column
 EXPLAIN (COSTS OFF)
 SELECT b.y, sum(a.y) FROM pagg_tab1 a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY b.y ORDER BY 1 NULLS LAST;
-                            QUERY PLAN                            
-------------------------------------------------------------------
- Finalize GroupAggregate
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize IndexAggregate
    Group Key: b.y
-   ->  Sort
-         Sort Key: b.y
-         ->  Append
-               ->  Partial HashAggregate
-                     Group Key: b.y
-                     ->  Hash Left Join
-                           Hash Cond: (a.x = b.y)
-                           ->  Seq Scan on pagg_tab1_p1 a
-                           ->  Hash
-                                 ->  Seq Scan on pagg_tab2_p1 b
-               ->  Partial HashAggregate
-                     Group Key: b_1.y
-                     ->  Hash Left Join
-                           Hash Cond: (a_1.x = b_1.y)
-                           ->  Seq Scan on pagg_tab1_p2 a_1
-                           ->  Hash
-                                 ->  Seq Scan on pagg_tab2_p2 b_1
-               ->  Partial HashAggregate
-                     Group Key: b_2.y
-                     ->  Hash Right Join
-                           Hash Cond: (b_2.y = a_2.x)
-                           ->  Seq Scan on pagg_tab2_p3 b_2
-                           ->  Hash
-                                 ->  Seq Scan on pagg_tab1_p3 a_2
-(26 rows)
+   ->  Append
+         ->  Partial HashAggregate
+               Group Key: b.y
+               ->  Hash Left Join
+                     Hash Cond: (a.x = b.y)
+                     ->  Seq Scan on pagg_tab1_p1 a
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab2_p1 b
+         ->  Partial HashAggregate
+               Group Key: b_1.y
+               ->  Hash Left Join
+                     Hash Cond: (a_1.x = b_1.y)
+                     ->  Seq Scan on pagg_tab1_p2 a_1
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab2_p2 b_1
+         ->  Partial HashAggregate
+               Group Key: b_2.y
+               ->  Hash Right Join
+                     Hash Cond: (b_2.y = a_2.x)
+                     ->  Seq Scan on pagg_tab2_p3 b_2
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab1_p3 a_2
+(24 rows)
 
 SELECT b.y, sum(a.y) FROM pagg_tab1 a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY b.y ORDER BY 1 NULLS LAST;
  y  | sum  
@@ -704,35 +849,33 @@ SELECT b.y, sum(a.y) FROM pagg_tab1 a RIGHT JOIN pagg_tab2 b ON a.x = b.y GROUP
 -- GROUP BY is on nullable column
 EXPLAIN (COSTS OFF)
 SELECT a.x, sum(b.x) FROM pagg_tab1 a FULL OUTER JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x ORDER BY 1 NULLS LAST;
-                            QUERY PLAN                            
-------------------------------------------------------------------
- Finalize GroupAggregate
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize IndexAggregate
    Group Key: a.x
-   ->  Sort
-         Sort Key: a.x
-         ->  Append
-               ->  Partial HashAggregate
-                     Group Key: a.x
-                     ->  Hash Full Join
-                           Hash Cond: (a.x = b.y)
-                           ->  Seq Scan on pagg_tab1_p1 a
-                           ->  Hash
-                                 ->  Seq Scan on pagg_tab2_p1 b
-               ->  Partial HashAggregate
-                     Group Key: a_1.x
-                     ->  Hash Full Join
-                           Hash Cond: (a_1.x = b_1.y)
-                           ->  Seq Scan on pagg_tab1_p2 a_1
-                           ->  Hash
-                                 ->  Seq Scan on pagg_tab2_p2 b_1
-               ->  Partial HashAggregate
-                     Group Key: a_2.x
-                     ->  Hash Full Join
-                           Hash Cond: (b_2.y = a_2.x)
-                           ->  Seq Scan on pagg_tab2_p3 b_2
-                           ->  Hash
-                                 ->  Seq Scan on pagg_tab1_p3 a_2
-(26 rows)
+   ->  Append
+         ->  Partial HashAggregate
+               Group Key: a.x
+               ->  Hash Full Join
+                     Hash Cond: (a.x = b.y)
+                     ->  Seq Scan on pagg_tab1_p1 a
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab2_p1 b
+         ->  Partial HashAggregate
+               Group Key: a_1.x
+               ->  Hash Full Join
+                     Hash Cond: (a_1.x = b_1.y)
+                     ->  Seq Scan on pagg_tab1_p2 a_1
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab2_p2 b_1
+         ->  Partial HashAggregate
+               Group Key: a_2.x
+               ->  Hash Full Join
+                     Hash Cond: (b_2.y = a_2.x)
+                     ->  Seq Scan on pagg_tab2_p3 b_2
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab1_p3 a_2
+(24 rows)
 
 SELECT a.x, sum(b.x) FROM pagg_tab1 a FULL OUTER JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x ORDER BY 1 NULLS LAST;
  x  | sum  
@@ -839,16 +982,14 @@ SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a FULL JOI
 -- Empty join relation because of empty outer side, no partitionwise agg plan
 EXPLAIN (COSTS OFF)
 SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x, a.y ORDER BY 1, 2;
-                  QUERY PLAN                  
-----------------------------------------------
- GroupAggregate
+               QUERY PLAN               
+----------------------------------------
+ IndexAggregate
    Group Key: pagg_tab1.y
-   ->  Sort
-         Sort Key: pagg_tab1.y
-         ->  Result
-               Replaces: Join on b, pagg_tab1
-               One-Time Filter: false
-(7 rows)
+   ->  Result
+         Replaces: Join on b, pagg_tab1
+         One-Time Filter: false
+(5 rows)
 
 SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x, a.y ORDER BY 1, 2;
  x | y | count 
@@ -869,7 +1010,7 @@ SELECT a, sum(b), avg(c), count(*) FROM pagg_tab_m GROUP BY a HAVING avg(c) < 22
 --------------------------------------------------------------------
  Sort
    Sort Key: pagg_tab_m.a, (sum(pagg_tab_m.b)), (avg(pagg_tab_m.c))
-   ->  Finalize HashAggregate
+   ->  Finalize IndexAggregate
          Group Key: pagg_tab_m.a
          Filter: (avg(pagg_tab_m.c) < '22'::numeric)
          ->  Append
@@ -1067,8 +1208,8 @@ RESET parallel_setup_cost;
 -- PARTITION KEY, thus we will have a partial aggregation for them.
 EXPLAIN (COSTS OFF)
 SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3;
-                                   QUERY PLAN                                    
----------------------------------------------------------------------------------
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
  Sort
    Sort Key: pagg_tab_ml.a, (sum(pagg_tab_ml.b)), (count(*))
    ->  Append
@@ -1076,31 +1217,27 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER B
                Group Key: pagg_tab_ml.a
                Filter: (avg(pagg_tab_ml.b) < '3'::numeric)
                ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
-         ->  Finalize GroupAggregate
+         ->  Finalize IndexAggregate
                Group Key: pagg_tab_ml_2.a
                Filter: (avg(pagg_tab_ml_2.b) < '3'::numeric)
-               ->  Sort
-                     Sort Key: pagg_tab_ml_2.a
-                     ->  Append
-                           ->  Partial HashAggregate
-                                 Group Key: pagg_tab_ml_2.a
-                                 ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
-                           ->  Partial HashAggregate
-                                 Group Key: pagg_tab_ml_3.a
-                                 ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
-         ->  Finalize GroupAggregate
+               ->  Append
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_ml_2.a
+                           ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_ml_3.a
+                           ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+         ->  Finalize IndexAggregate
                Group Key: pagg_tab_ml_5.a
                Filter: (avg(pagg_tab_ml_5.b) < '3'::numeric)
-               ->  Sort
-                     Sort Key: pagg_tab_ml_5.a
-                     ->  Append
-                           ->  Partial HashAggregate
-                                 Group Key: pagg_tab_ml_5.a
-                                 ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5
-                           ->  Partial HashAggregate
-                                 Group Key: pagg_tab_ml_6.a
-                                 ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6
-(31 rows)
+               ->  Append
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_ml_5.a
+                           ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_ml_6.a
+                           ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6
+(27 rows)
 
 SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3;
  a  | sum  | count 
@@ -1120,31 +1257,29 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER B
 -- PARTITION KEY
 EXPLAIN (COSTS OFF)
 SELECT b, sum(a), count(*) FROM pagg_tab_ml GROUP BY b ORDER BY 1, 2, 3;
-                                QUERY PLAN                                 
----------------------------------------------------------------------------
+                             QUERY PLAN                              
+---------------------------------------------------------------------
  Sort
    Sort Key: pagg_tab_ml.b, (sum(pagg_tab_ml.a)), (count(*))
-   ->  Finalize GroupAggregate
+   ->  Finalize IndexAggregate
          Group Key: pagg_tab_ml.b
-         ->  Sort
-               Sort Key: pagg_tab_ml.b
-               ->  Append
-                     ->  Partial HashAggregate
-                           Group Key: pagg_tab_ml.b
-                           ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
-                     ->  Partial HashAggregate
-                           Group Key: pagg_tab_ml_1.b
-                           ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_1
-                     ->  Partial HashAggregate
-                           Group Key: pagg_tab_ml_2.b
-                           ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2
-                     ->  Partial HashAggregate
-                           Group Key: pagg_tab_ml_3.b
-                           ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_3
-                     ->  Partial HashAggregate
-                           Group Key: pagg_tab_ml_4.b
-                           ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_4
-(22 rows)
+         ->  Append
+               ->  Partial HashAggregate
+                     Group Key: pagg_tab_ml.b
+                     ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
+               ->  Partial HashAggregate
+                     Group Key: pagg_tab_ml_1.b
+                     ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_1
+               ->  Partial HashAggregate
+                     Group Key: pagg_tab_ml_2.b
+                     ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2
+               ->  Partial HashAggregate
+                     Group Key: pagg_tab_ml_3.b
+                     ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_3
+               ->  Partial HashAggregate
+                     Group Key: pagg_tab_ml_4.b
+                     ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_4
+(20 rows)
 
 SELECT b, sum(a), count(*) FROM pagg_tab_ml GROUP BY b HAVING avg(a) < 15 ORDER BY 1, 2, 3;
  b |  sum  | count 
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 933921d1860..0318863bf1f 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -706,18 +706,16 @@ alter table tenk2 reset (parallel_workers);
 set enable_hashagg = false;
 explain (costs off)
    select count(*) from tenk1 group by twenty;
-                     QUERY PLAN                     
-----------------------------------------------------
+                  QUERY PLAN                  
+----------------------------------------------
  Finalize GroupAggregate
    Group Key: twenty
    ->  Gather Merge
          Workers Planned: 4
-         ->  Partial GroupAggregate
+         ->  Partial IndexAggregate
                Group Key: twenty
-               ->  Sort
-                     Sort Key: twenty
-                     ->  Parallel Seq Scan on tenk1
-(9 rows)
+               ->  Parallel Seq Scan on tenk1
+(7 rows)
 
 select count(*) from tenk1 group by twenty;
  count 
@@ -772,19 +770,17 @@ drop function sp_simple_func(integer);
 -- test handling of SRFs in targetlist (bug in 10.0)
 explain (costs off)
    select count(*), generate_series(1,2) from tenk1 group by twenty;
-                        QUERY PLAN                        
-----------------------------------------------------------
+                     QUERY PLAN                     
+----------------------------------------------------
  ProjectSet
    ->  Finalize GroupAggregate
          Group Key: twenty
          ->  Gather Merge
                Workers Planned: 4
-               ->  Partial GroupAggregate
+               ->  Partial IndexAggregate
                      Group Key: twenty
-                     ->  Sort
-                           Sort Key: twenty
-                           ->  Parallel Seq Scan on tenk1
-(10 rows)
+                     ->  Parallel Seq Scan on tenk1
+(8 rows)
 
 select count(*), generate_series(1,2) from tenk1 group by twenty;
  count | generate_series 
@@ -833,6 +829,7 @@ select count(*), generate_series(1,2) from tenk1 group by twenty;
 
 -- test gather merge with parallel leader participation disabled
 set parallel_leader_participation = off;
+set enable_indexagg = off;
 explain (costs off)
    select count(*) from tenk1 group by twenty;
                      QUERY PLAN                     
@@ -876,6 +873,7 @@ select count(*) from tenk1 group by twenty;
 reset parallel_leader_participation;
 --test rescan behavior of gather merge
 set enable_material = false;
+set enable_indexagg = false;
 explain (costs off)
 select * from
   (select string4, count(unique2)
@@ -917,6 +915,7 @@ select * from
 (12 rows)
 
 reset enable_material;
+reset enable_indexagg;
 reset enable_hashagg;
 -- check parallelized int8 aggregate (bug #14897)
 explain (costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 0411db832f1..d32bec316d3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -157,6 +157,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashagg                 | on
  enable_hashjoin                | on
  enable_incremental_sort        | on
+ enable_indexagg                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
  enable_material                | on
@@ -173,7 +174,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(25 rows)
+(26 rows)
 
 -- There are always wait event descriptions for various types.  InjectionPoint
 -- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 850f5a5787f..f72eb367112 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1392,6 +1392,7 @@ CREATE INDEX btg_x_y_idx ON btg(x, y);
 ANALYZE btg;
 
 SET enable_hashagg = off;
+SET enable_indexagg = off;
 SET enable_seqscan = off;
 
 -- Utilize the ordering of index scan to avoid a Sort operation
@@ -1623,12 +1624,100 @@ select v||'a', case v||'a' when 'aa' then 1 else 0 end, count(*)
 select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
   from unnest(array['a','b']) u(v)
  group by v||'a' order by 1;
+ 
+--
+-- Index Aggregation tests
+--
+
+set enable_hashagg = false;
+set enable_sort = false;
+set enable_indexagg = true;
+set enable_indexscan = false;
+
+-- require ordered output
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+-- multiple grouping columns
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+
+-- mixing columns between group by and order by
+begin;
+
+create temp table tmp(x int, y int);
+insert into tmp values (1, 8), (2, 7), (3, 6), (4, 5);
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+
+--
+-- Index Aggregation Spill tests
+--
+
+set enable_indexagg = true;
+set enable_sort=false;
+set enable_hashagg = false;
+set work_mem='64kB';
+
+select unique1, count(*), sum(twothousand) from tenk1
+group by unique1
+having sum(fivethous) > 4975
+order by sum(twothousand);
+
+set work_mem to default;
+set enable_sort to default;
+set enable_hashagg to default;
+set enable_indexagg to default;
 
 --
 -- Hash Aggregation Spill tests
 --
 
 set enable_sort=false;
+set enable_indexagg = false;
 set work_mem='64kB';
 
 select unique1, count(*), sum(twothousand) from tenk1
@@ -1657,6 +1746,7 @@ analyze agg_data_20k;
 -- Produce results with sorting.
 
 set enable_hashagg = false;
+set enable_indexagg = false;
 
 set jit_above_cost = 0;
 
@@ -1728,23 +1818,68 @@ select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
 set enable_sort = true;
 set work_mem to default;
 
+-- Produce results with index aggregation
+
+set enable_sort = false;
+set enable_hashagg = false;
+set enable_indexagg = true;
+
+set jit_above_cost = 0;
+
+explain (costs off)
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+
+create table agg_index_1 as
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+
+create table agg_index_2 as
+select * from
+  (values (100), (300), (500)) as r(a),
+  lateral (
+    select (g/2)::numeric as c1,
+           array_agg(g::numeric) as c2,
+	   count(*) as c3
+    from agg_data_2k
+    where g < r.a
+    group by g/2) as s;
+
+set jit_above_cost to default;
+
+create table agg_index_3 as
+select (g/2)::numeric as c1, sum(7::int4) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+
+create table agg_index_4 as
+select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+
 -- Compare group aggregation results to hash aggregation results
 
 (select * from agg_hash_1 except select * from agg_group_1)
   union all
-(select * from agg_group_1 except select * from agg_hash_1);
+(select * from agg_group_1 except select * from agg_hash_1)
+  union all
+(select * from agg_index_1 except select * from agg_group_1);
 
 (select * from agg_hash_2 except select * from agg_group_2)
   union all
-(select * from agg_group_2 except select * from agg_hash_2);
+(select * from agg_group_2 except select * from agg_hash_2)
+  union all
+(select * from agg_index_2 except select * from agg_group_2);
 
 (select * from agg_hash_3 except select * from agg_group_3)
   union all
-(select * from agg_group_3 except select * from agg_hash_3);
+(select * from agg_group_3 except select * from agg_hash_3)
+  union all
+(select * from agg_index_3 except select * from agg_group_3);
 
 (select * from agg_hash_4 except select * from agg_group_4)
   union all
-(select * from agg_group_4 except select * from agg_hash_4);
+(select * from agg_group_4 except select * from agg_hash_4)
+  union all
+(select * from agg_index_4 except select * from agg_group_4);
 
 drop table agg_group_1;
 drop table agg_group_2;
@@ -1754,3 +1889,7 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_index_1;
+drop table agg_index_2;
+drop table agg_index_3;
+drop table agg_index_4;
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
index abe6d6ae09f..f9f4b5dcebd 100644
--- a/src/test/regress/sql/eager_aggregate.sql
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -35,6 +35,7 @@ GROUP BY t1.a ORDER BY t1.a;
 
 -- Produce results with sorting aggregation
 SET enable_hashagg TO off;
+SET enable_indexagg TO off;
 
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a, avg(t2.c)
@@ -48,6 +49,25 @@ SELECT t1.a, avg(t2.c)
 GROUP BY t1.a ORDER BY t1.a;
 
 RESET enable_hashagg;
+RESET enable_indexagg;
+
+-- Produce results with index aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+RESET enable_sort;
 
 
 --
@@ -71,6 +91,7 @@ GROUP BY t1.a ORDER BY t1.a;
 
 -- Produce results with sorting aggregation
 SET enable_hashagg TO off;
+SET enable_indexagg TO off;
 
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a, avg(t2.c + t3.c)
@@ -86,7 +107,27 @@ SELECT t1.a, avg(t2.c + t3.c)
 GROUP BY t1.a ORDER BY t1.a;
 
 RESET enable_hashagg;
+RESET enable_indexagg;
 
+-- Produce results with index aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+  JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c + t3.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+  JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+RESET enable_sort;
 
 --
 -- Test that eager aggregation works for outer join
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 7ec84f3b143..98b3dfcc3cc 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -605,6 +605,7 @@ select count(*) from
 set enable_hashjoin = 0;
 set enable_nestloop = 0;
 set enable_hashagg = 0;
+set enable_indexagg = 0;
 
 --
 -- Check that we use the pathkeys from a prefix of the group by / order by
@@ -617,6 +618,7 @@ from tenk1 x inner join tenk1 y on x.thousand = y.thousand
 group by x.thousand, x.twothousand
 order by x.thousand desc, x.twothousand;
 
+reset enable_indexagg;
 reset enable_hashagg;
 reset enable_nestloop;
 reset enable_hashjoin;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index 7c725e2663a..570aac38fc5 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -55,8 +55,9 @@ EXPLAIN (COSTS OFF)
 SELECT c, sum(a) FROM pagg_tab WHERE c = 'x' GROUP BY c;
 SELECT c, sum(a) FROM pagg_tab WHERE c = 'x' GROUP BY c;
 
--- Test GroupAggregate paths by disabling hash aggregates.
+-- Test GroupAggregate paths by disabling hash and index aggregates.
 SET enable_hashagg TO false;
+SET enable_indexagg TO false;
 
 -- When GROUP BY clause matches full aggregation is performed for each partition.
 EXPLAIN (COSTS OFF)
@@ -81,6 +82,32 @@ EXPLAIN (COSTS OFF)
 SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;
 SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;
 
+RESET enable_hashagg;
+RESET enable_indexagg;
+
+-- Test IndexAggregate paths by disabling hash and group aggregates.
+SET enable_sort TO false;
+SET enable_hashagg TO false;
+
+-- When GROUP BY clause matches full aggregation is performed for each partition.
+EXPLAIN (COSTS OFF)
+SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+
+-- When GROUP BY clause does not match; top finalize node is required
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+
+-- Test partitionwise grouping without any aggregates
+EXPLAIN (COSTS OFF)
+SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
+SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
+SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
+
+RESET enable_sort;
 RESET enable_hashagg;
 
 -- ROLLUP, partitionwise aggregation does not apply
@@ -135,10 +162,12 @@ SELECT t2.y, sum(t1.y), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2
 -- When GROUP BY clause does not match; partial aggregation is performed for each partition.
 -- Also test GroupAggregate paths by disabling hash aggregates.
 SET enable_hashagg TO false;
+SET enable_indexagg TO false;
 EXPLAIN (COSTS OFF)
 SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y HAVING avg(t1.x) > 10 ORDER BY 1, 2, 3;
 SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y HAVING avg(t1.x) > 10 ORDER BY 1, 2, 3;
 RESET enable_hashagg;
+RESET enable_indexagg;
 
 -- Check with LEFT/RIGHT/FULL OUTER JOINs which produces NULL values for
 -- aggregation
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 71a75bc86ea..5f398219166 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -318,6 +318,7 @@ select count(*), generate_series(1,2) from tenk1 group by twenty;
 
 -- test gather merge with parallel leader participation disabled
 set parallel_leader_participation = off;
+set enable_indexagg = off;
 
 explain (costs off)
    select count(*) from tenk1 group by twenty;
@@ -328,6 +329,7 @@ reset parallel_leader_participation;
 
 --test rescan behavior of gather merge
 set enable_material = false;
+set enable_indexagg = false;
 
 explain (costs off)
 select * from
@@ -341,6 +343,7 @@ select * from
   right join (values (1),(2),(3)) v(x) on true;
 
 reset enable_material;
+reset enable_indexagg;
 
 reset enable_hashagg;
 
-- 
2.43.0

Andrei Lepikhov

lepihov@gmail.com

17 days ago

In reply to: Sergey Soloviev (#7)

Re: Introduce Index Aggregate - new GROUP BY strategy

On 12/12/25 17:23, Sergey Soloviev wrote:

This logic is placed in another patch file just to make review of this
change easier.

Also, cost calculation logic is adjusted a bit - it takes into account
top-down index
traversal and final external merge cost is added only if spill expected.

Hi,

1. Your 0002 patch needs a trivial rebase
2. Multiple trailing backspaces throughout the patch set. Please, remove
it. You may just set your IDE to remove it automatically.

--
regards, Andrei Lepikhov,
pgEdge

Andrei Lepikhov

lepihov@gmail.com

17 days ago

In reply to: Sergey Soloviev (#7)

Re: Introduce Index Aggregate - new GROUP BY strategy

On 12/12/25 17:23, Sergey Soloviev wrote:

Also, cost calculation logic is adjusted a bit - it takes into account
top-down index
traversal and final external merge cost is added only if spill expected.

Hi,
Here is my 'aerial' review:
The patch proposes a new aggregation strategy that builds an in-memory
B+tree index for grouping. This combines incremental group formation
(like AGG_HASHED) with sorted output (like AGG_SORTED), which is
beneficial when the query requires both grouping and ordering on
(almost) the same columns.
The key advantage is avoiding a separate sort step when the sorted
output is needed, at the cost of additional CPU overhead.

My doubts:
1. Can you benchmark the scenario where the optimiser mispredicts
numGroups. If the planner underestimates group cardinality, the btree
overhead could be much higher than expected. Does the approach degrade
gracefully?
2. Consider splitting the hash_* → spill_* field renaming into a
separate preparatory commit to reduce the complexity of reviewing the
core logic changes.
3. I notice AGG_INDEX requires both sortable AND hashable types. While I
understand this is for the hash-based spill partitioning, is this
limitation necessary? Could you use sort-based spilling (similar to
tuplesort's external merge) instead? This would allow AGG_INDEX to work
with sortable-only types (I can imagine a geometric type with B-tree
operators but no hash functions).

The main question for me is: can you invent a robust cost model to set
smooth boundaries between all three types of grouping? Does it really
promise frequent benefits and avoid degradations? - Remember,
increasing search space we increase planning time, which may be palpable
in cases with many groupings/grouping attributes - for example, an
APPEND over a partitioned table with pushed-down aggregate looks like a
trivial case.

--
regards, Andrei Lepikhov,
pgEdge

#10

Sergey Soloviev

sergey.soloviev@tantorlabs.ru

8 days ago

In reply to: Andrei Lepikhov (#9)

5 attachment(s)

Re: Introduce Index Aggregate - new GROUP BY strategy

Hi!

Sorry for late answer, I didn't notice your email.

Here is my 'aerial' review

Yes. You are right.

Can you benchmark the scenario where the optimiser mispredicts numGroups. If the planner underestimates group cardinality, the btree overhead could be much higher than expected. Does the approach degrade gracefully?

I will try

2. Consider splitting the hash_* → spill_* field renaming into a separate preparatory commit to reduce the complexity of reviewing the core logic changes.

Will be done

3. I notice AGG_INDEX requires both sortable AND hashable types. While I understand this is for the hash-based spill partitioning, is this limitation necessary? Could you use sort-based spilling (similar to tuplesort's external merge) instead? This would allow AGG_INDEX to work with sortable-only types (I can imagine a geometric type with B-tree operators but no hash functions).

I think this is possible if we could use combine function. I did not think about this yet.

---

Some days ago I have implemented Ttree as internal index instead of B+tree. To my surprise, the performance degraded. There is a table with benchmark results (amount is amount of groups and value is latency in ms).

int

| amount | HashAgg | GroupAgg | IndexAgg |
| ------ | ------- | -------- | -------- |
| 100 | 0.222 | 0.199 | 0.198 |
| 1000 | 1.506 | 1.506 | 1.414 |
| 10000 | 15.414 | 15.598 | 15.891 |
| 100000 | 159.625 | 171.507 | 194.401 |

bigint

| amount | HashAgg | GroupAgg | IndexAgg |
| ------ | ------- | -------- | -------- |
| 100 | 0.220 | 0.198 | 0.196 |
| 1000 | 1.504 | 1.514 | 1.419 |
| 10000 | 15.404 | 15.717 | 15.836 |
| 100000 | 160.323 | 172.922 | 193.799 |

text

| amount | HashAgg | GroupAgg | IndexAgg |
| ------ | ------- | -------- | -------- |
| 100 | 0.280 | 0.301 | 0.287 |
| 1000 | 2.267 | 2.954 | 2.734 |
| 10000 | 24.613 | 35.383 | 35.401 |
| 100000 | 270.657 | 430.929 | 485.113 |

uuid

| amount | HashAgg | GroupAgg | IndexAgg |
| ------ | ------- | -------- | -------- |
| 100 | 0.311 | 0.317 | 0.310 |
| 1000 | 2.827 | 2.667 | 2.675 |
| 10000 | 33.233 | 26.980 | 28.848 |
| 100000 | 437.452 | 287.236 | 363.142 |

You can notice how latency increases when amount of groups reaches 100K. Probably this is because of low branching of the Ttree - unlike B+tree it has only 2 children, so have to traverse more nodes.
Also, I do not deny that the problem may be in my code, i.e. some paths are not optimized or there is a bug and tree becomes imbalanced.
I will try to implement simple Btree as another attempt (not B+tree).

The patches are in attachments.

---
Sergey Soloviev

TantorLabs: https://tantorlabs.com

Attachments:

v4-0005-fix-tests-for-IndexAggregate.patchtext/x-patch; charset=UTF-8; name=v4-0005-fix-tests-for-IndexAggregate.patchDownload

From a26f6f8c1898cf212b30c33c10dfceeedd474c2c Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Thu, 11 Dec 2025 16:06:01 +0300
Subject: [PATCH v4 5/5] fix tests for IndexAggregate

After adding IndexAggregate node some test output changed and tests
broke. This patch updates expected output.

Also it adds some IndexAggregate specific tests into aggregates.sql and
partition_aggregate.sql.
---
 .../postgres_fdw/expected/postgres_fdw.out    |  39 +-
 src/test/regress/expected/aggregates.out      | 291 +++++++++-
 .../regress/expected/collate.icu.utf8.out     |  16 +-
 src/test/regress/expected/eager_aggregate.out | 539 ++++++++++--------
 src/test/regress/expected/join.out            |  31 +-
 .../regress/expected/partition_aggregate.out  | 361 ++++++++----
 src/test/regress/expected/select_parallel.out |  27 +-
 src/test/regress/expected/sysviews.out        |   3 +-
 src/test/regress/sql/aggregates.sql           | 147 ++++-
 src/test/regress/sql/eager_aggregate.sql      |  41 ++
 src/test/regress/sql/join.sql                 |   2 +
 src/test/regress/sql/partition_aggregate.sql  |  31 +-
 src/test/regress/sql/select_parallel.sql      |   3 +
 13 files changed, 1096 insertions(+), 435 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6066510c7c0..0a03140a80b 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3701,33 +3701,30 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
 -- Subquery in FROM clause having aggregate
 explain (verbose, costs off)
 select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
-                                       QUERY PLAN                                        
------------------------------------------------------------------------------------------
+                                    QUERY PLAN                                     
+-----------------------------------------------------------------------------------
  Sort
    Output: (count(*)), (sum(ft1_1.c1))
    Sort Key: (count(*)), (sum(ft1_1.c1))
-   ->  Finalize GroupAggregate
+   ->  Finalize IndexAggregate
          Output: count(*), (sum(ft1_1.c1))
          Group Key: (sum(ft1_1.c1))
-         ->  Sort
+         ->  Hash Join
                Output: (sum(ft1_1.c1)), (PARTIAL count(*))
-               Sort Key: (sum(ft1_1.c1))
-               ->  Hash Join
-                     Output: (sum(ft1_1.c1)), (PARTIAL count(*))
-                     Hash Cond: (ft1_1.c2 = ft1.c2)
-                     ->  Foreign Scan
-                           Output: ft1_1.c2, (sum(ft1_1.c1))
-                           Relations: Aggregate on (public.ft1 ft1_1)
-                           Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-                     ->  Hash
-                           Output: ft1.c2, (PARTIAL count(*))
-                           ->  Partial HashAggregate
-                                 Output: ft1.c2, PARTIAL count(*)
-                                 Group Key: ft1.c2
-                                 ->  Foreign Scan on public.ft1
-                                       Output: ft1.c2
-                                       Remote SQL: SELECT c2 FROM "S 1"."T 1"
-(24 rows)
+               Hash Cond: (ft1_1.c2 = ft1.c2)
+               ->  Foreign Scan
+                     Output: ft1_1.c2, (sum(ft1_1.c1))
+                     Relations: Aggregate on (public.ft1 ft1_1)
+                     Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+               ->  Hash
+                     Output: ft1.c2, (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: ft1.c2, PARTIAL count(*)
+                           Group Key: ft1.c2
+                           ->  Foreign Scan on public.ft1
+                                 Output: ft1.c2
+                                 Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(21 rows)
 
 select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
  count |   b   
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index cae8e7bca31..afe01f5da85 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1533,7 +1533,7 @@ explain (costs off) select * from t1 group by a,b,c,d;
 explain (costs off) select * from only t1 group by a,b,c,d;
       QUERY PLAN      
 ----------------------
- HashAggregate
+ IndexAggregate
    Group Key: a, b
    ->  Seq Scan on t1
 (3 rows)
@@ -3270,6 +3270,7 @@ FROM generate_series(1, 100) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x, y);
 ANALYZE btg;
 SET enable_hashagg = off;
+SET enable_indexagg = off;
 SET enable_seqscan = off;
 -- Utilize the ordering of index scan to avoid a Sort operation
 EXPLAIN (COSTS OFF)
@@ -3707,10 +3708,242 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
  ba       |    0 |     1
 (2 rows)
 
+ 
+--
+-- Index Aggregation tests
+--
+set enable_hashagg = false;
+set enable_sort = false;
+set enable_indexagg = true;
+set enable_indexscan = false;
+-- require ordered output
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: unique1, (sum(two))
+   ->  IndexAggregate
+         Output: unique1, sum(two)
+         Group Key: tenk1.unique1
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+ unique1 | sum 
+---------+-----
+       0 |   0
+       1 |   1
+       2 |   0
+       3 |   1
+       4 |   0
+       5 |   1
+       6 |   0
+       7 |   1
+       8 |   0
+       9 |   1
+(10 rows)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: even, (sum(two))
+   ->  IndexAggregate
+         Output: even, sum(two)
+         Group Key: tenk1.even
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+ even | sum 
+------+-----
+    1 |   0
+    3 | 100
+    5 |   0
+    7 | 100
+    9 |   0
+   11 | 100
+   13 |   0
+   15 | 100
+   17 |   0
+   19 | 100
+(10 rows)
+
+-- multiple grouping columns
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+                                                                         QUERY PLAN                                                                          
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+   Output: even, odd, (sum(unique1))
+   ->  IndexAggregate
+         Output: even, odd, sum(unique1)
+         Group Key: tenk1.even, tenk1.odd
+         ->  Seq Scan on public.tenk1
+               Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
+(7 rows)
+
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+ even | odd |  sum   
+------+-----+--------
+    1 |   0 | 495000
+    3 |   2 | 495100
+    5 |   4 | 495200
+    7 |   6 | 495300
+    9 |   8 | 495400
+   11 |  10 | 495500
+   13 |  12 | 495600
+   15 |  14 | 495700
+   17 |  16 | 495800
+   19 |  18 | 495900
+(10 rows)
+
+-- mixing columns between group by and order by
+begin;
+create temp table tmp(x int, y int);
+insert into tmp values (1, 8), (2, 7), (3, 6), (4, 5);
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+          QUERY PLAN           
+-------------------------------
+ IndexAggregate
+   Output: x, y, sum(x)
+   Group Key: tmp.x, tmp.y
+   ->  Seq Scan on pg_temp.tmp
+         Output: x, y
+(5 rows)
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+ x | y | sum 
+---+---+-----
+ 1 | 8 |   1
+ 2 | 7 |   2
+ 3 | 6 |   3
+ 4 | 5 |   4
+(4 rows)
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+          QUERY PLAN           
+-------------------------------
+ IndexAggregate
+   Output: x, y, sum(x)
+   Group Key: tmp.y, tmp.x
+   ->  Seq Scan on pg_temp.tmp
+         Output: x, y
+(5 rows)
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+ x | y | sum 
+---+---+-----
+ 4 | 5 |   4
+ 3 | 6 |   3
+ 2 | 7 |   2
+ 1 | 8 |   1
+(4 rows)
+
+--
+-- Index Aggregation Spill tests
+--
+set enable_indexagg = true;
+set enable_sort=false;
+set enable_hashagg = false;
+set work_mem='64kB';
+select unique1, count(*), sum(twothousand) from tenk1
+group by unique1
+having sum(fivethous) > 4975
+order by sum(twothousand);
+ unique1 | count | sum  
+---------+-------+------
+    4976 |     1 |  976
+    4977 |     1 |  977
+    4978 |     1 |  978
+    4979 |     1 |  979
+    4980 |     1 |  980
+    4981 |     1 |  981
+    4982 |     1 |  982
+    4983 |     1 |  983
+    4984 |     1 |  984
+    4985 |     1 |  985
+    4986 |     1 |  986
+    4987 |     1 |  987
+    4988 |     1 |  988
+    4989 |     1 |  989
+    4990 |     1 |  990
+    4991 |     1 |  991
+    4992 |     1 |  992
+    4993 |     1 |  993
+    4994 |     1 |  994
+    4995 |     1 |  995
+    4996 |     1 |  996
+    4997 |     1 |  997
+    4998 |     1 |  998
+    4999 |     1 |  999
+    9976 |     1 | 1976
+    9977 |     1 | 1977
+    9978 |     1 | 1978
+    9979 |     1 | 1979
+    9980 |     1 | 1980
+    9981 |     1 | 1981
+    9982 |     1 | 1982
+    9983 |     1 | 1983
+    9984 |     1 | 1984
+    9985 |     1 | 1985
+    9986 |     1 | 1986
+    9987 |     1 | 1987
+    9988 |     1 | 1988
+    9989 |     1 | 1989
+    9990 |     1 | 1990
+    9991 |     1 | 1991
+    9992 |     1 | 1992
+    9993 |     1 | 1993
+    9994 |     1 | 1994
+    9995 |     1 | 1995
+    9996 |     1 | 1996
+    9997 |     1 | 1997
+    9998 |     1 | 1998
+    9999 |     1 | 1999
+(48 rows)
+
+set work_mem to default;
+set enable_sort to default;
+set enable_hashagg to default;
+set enable_indexagg to default;
 --
 -- Hash Aggregation Spill tests
 --
 set enable_sort=false;
+set enable_indexagg = false;
 set work_mem='64kB';
 select unique1, count(*), sum(twothousand) from tenk1
 group by unique1
@@ -3783,6 +4016,7 @@ select g from generate_series(0, 19999) g;
 analyze agg_data_20k;
 -- Produce results with sorting.
 set enable_hashagg = false;
+set enable_indexagg = false;
 set jit_above_cost = 0;
 explain (costs off)
 select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
@@ -3852,31 +4086,74 @@ select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
   from agg_data_2k group by g/2;
 set enable_sort = true;
 set work_mem to default;
+-- Produce results with index aggregation
+set enable_sort = false;
+set enable_hashagg = false;
+set enable_indexagg = true;
+set jit_above_cost = 0;
+explain (costs off)
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+           QUERY PLAN           
+--------------------------------
+ IndexAggregate
+   Group Key: (g % 10000)
+   ->  Seq Scan on agg_data_20k
+(3 rows)
+
+create table agg_index_1 as
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+create table agg_index_2 as
+select * from
+  (values (100), (300), (500)) as r(a),
+  lateral (
+    select (g/2)::numeric as c1,
+           array_agg(g::numeric) as c2,
+	   count(*) as c3
+    from agg_data_2k
+    where g < r.a
+    group by g/2) as s;
+set jit_above_cost to default;
+create table agg_index_3 as
+select (g/2)::numeric as c1, sum(7::int4) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+create table agg_index_4 as
+select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
 -- Compare group aggregation results to hash aggregation results
 (select * from agg_hash_1 except select * from agg_group_1)
   union all
-(select * from agg_group_1 except select * from agg_hash_1);
+(select * from agg_group_1 except select * from agg_hash_1)
+  union all
+(select * from agg_index_1 except select * from agg_group_1);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
 
 (select * from agg_hash_2 except select * from agg_group_2)
   union all
-(select * from agg_group_2 except select * from agg_hash_2);
+(select * from agg_group_2 except select * from agg_hash_2)
+  union all
+(select * from agg_index_2 except select * from agg_group_2);
  a | c1 | c2 | c3 
 ---+----+----+----
 (0 rows)
 
 (select * from agg_hash_3 except select * from agg_group_3)
   union all
-(select * from agg_group_3 except select * from agg_hash_3);
+(select * from agg_group_3 except select * from agg_hash_3)
+  union all
+(select * from agg_index_3 except select * from agg_group_3);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
 
 (select * from agg_hash_4 except select * from agg_group_4)
   union all
-(select * from agg_group_4 except select * from agg_hash_4);
+(select * from agg_group_4 except select * from agg_hash_4)
+  union all
+(select * from agg_index_4 except select * from agg_group_4);
  c1 | c2 | c3 
 ----+----+----
 (0 rows)
@@ -3889,3 +4166,7 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_index_1;
+drop table agg_index_2;
+drop table agg_index_3;
+drop table agg_index_4;
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 8023014fe63..c62e312175c 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2395,8 +2395,8 @@ SELECT upper(c collate case_insensitive), count(c) FROM pagg_tab3 GROUP BY c col
 --------------------------------------------------------------
  Sort
    Sort Key: (upper(pagg_tab3.c)) COLLATE case_insensitive
-   ->  Finalize HashAggregate
-         Group Key: pagg_tab3.c
+   ->  Finalize IndexAggregate
+         Group Key: pagg_tab3.c COLLATE case_insensitive
          ->  Append
                ->  Partial HashAggregate
                      Group Key: pagg_tab3.c
@@ -2613,20 +2613,20 @@ INSERT INTO pagg_tab6 (b, c) SELECT substr('cdCD', (i % 4) + 1 , 1), substr('cdC
 ANALYZE pagg_tab6;
 EXPLAIN (COSTS OFF)
 SELECT t1.c, count(t2.c) FROM pagg_tab5 t1 JOIN pagg_tab6 t2 ON t1.c = t2.c AND t1.c = t2.b GROUP BY 1 ORDER BY t1.c COLLATE "C";
-                      QUERY PLAN                       
--------------------------------------------------------
+                        QUERY PLAN                        
+----------------------------------------------------------
  Sort
    Sort Key: t1.c COLLATE "C"
    ->  Append
-         ->  HashAggregate
-               Group Key: t1.c
+         ->  IndexAggregate
+               Group Key: t1.c COLLATE case_insensitive
                ->  Nested Loop
                      Join Filter: (t1.c = t2.c)
                      ->  Seq Scan on pagg_tab6_p1 t2
                            Filter: (c = b)
                      ->  Seq Scan on pagg_tab5_p1 t1
-         ->  HashAggregate
-               Group Key: t1_1.c
+         ->  IndexAggregate
+               Group Key: t1_1.c COLLATE case_insensitive
                ->  Nested Loop
                      Join Filter: (t1_1.c = t2_1.c)
                      ->  Seq Scan on pagg_tab6_p2 t2_1
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
index 5ac966186f7..0d4468fa686 100644
--- a/src/test/regress/expected/eager_aggregate.out
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -21,27 +21,24 @@ SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
   JOIN eager_agg_t2 t2 ON t1.b = t2.b
 GROUP BY t1.a ORDER BY t1.a;
-                            QUERY PLAN                            
-------------------------------------------------------------------
- Finalize GroupAggregate
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize IndexAggregate
    Output: t1.a, avg(t2.c)
    Group Key: t1.a
-   ->  Sort
+   ->  Hash Join
          Output: t1.a, (PARTIAL avg(t2.c))
-         Sort Key: t1.a
-         ->  Hash Join
-               Output: t1.a, (PARTIAL avg(t2.c))
-               Hash Cond: (t1.b = t2.b)
-               ->  Seq Scan on public.eager_agg_t1 t1
-                     Output: t1.a, t1.b, t1.c
-               ->  Hash
-                     Output: t2.b, (PARTIAL avg(t2.c))
-                     ->  Partial HashAggregate
-                           Output: t2.b, PARTIAL avg(t2.c)
-                           Group Key: t2.b
-                           ->  Seq Scan on public.eager_agg_t2 t2
-                                 Output: t2.a, t2.b, t2.c
-(18 rows)
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg(t2.c))
+               ->  Partial HashAggregate
+                     Output: t2.b, PARTIAL avg(t2.c)
+                     Group Key: t2.b
+                     ->  Seq Scan on public.eager_agg_t2 t2
+                           Output: t2.a, t2.b, t2.c
+(15 rows)
 
 SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
@@ -62,6 +59,7 @@ GROUP BY t1.a ORDER BY t1.a;
 
 -- Produce results with sorting aggregation
 SET enable_hashagg TO off;
+SET enable_indexagg TO off;
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
@@ -110,6 +108,53 @@ GROUP BY t1.a ORDER BY t1.a;
 (9 rows)
 
 RESET enable_hashagg;
+RESET enable_indexagg;
+-- Produce results with index aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize IndexAggregate
+   Output: t1.a, avg(t2.c)
+   Group Key: t1.a
+   ->  Hash Join
+         Output: t1.a, (PARTIAL avg(t2.c))
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg(t2.c))
+               ->  Partial IndexAggregate
+                     Output: t2.b, PARTIAL avg(t2.c)
+                     Group Key: t2.b
+                     ->  Seq Scan on public.eager_agg_t2 t2
+                           Output: t2.a, t2.b, t2.c
+(15 rows)
+
+SELECT t1.a, avg(t2.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg 
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+RESET enable_sort;
 --
 -- Test eager aggregation over join rel
 --
@@ -121,34 +166,31 @@ SELECT t1.a, avg(t2.c + t3.c)
   JOIN eager_agg_t2 t2 ON t1.b = t2.b
   JOIN eager_agg_t3 t3 ON t2.a = t3.a
 GROUP BY t1.a ORDER BY t1.a;
-                                  QUERY PLAN                                  
-------------------------------------------------------------------------------
- Finalize GroupAggregate
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Finalize IndexAggregate
    Output: t1.a, avg((t2.c + t3.c))
    Group Key: t1.a
-   ->  Sort
+   ->  Hash Join
          Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
-         Sort Key: t1.a
-         ->  Hash Join
-               Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
-               Hash Cond: (t1.b = t2.b)
-               ->  Seq Scan on public.eager_agg_t1 t1
-                     Output: t1.a, t1.b, t1.c
-               ->  Hash
-                     Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
-                     ->  Partial HashAggregate
-                           Output: t2.b, PARTIAL avg((t2.c + t3.c))
-                           Group Key: t2.b
-                           ->  Hash Join
-                                 Output: t2.c, t2.b, t3.c
-                                 Hash Cond: (t3.a = t2.a)
-                                 ->  Seq Scan on public.eager_agg_t3 t3
-                                       Output: t3.a, t3.b, t3.c
-                                 ->  Hash
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+               ->  Partial HashAggregate
+                     Output: t2.b, PARTIAL avg((t2.c + t3.c))
+                     Group Key: t2.b
+                     ->  Hash Join
+                           Output: t2.c, t2.b, t3.c
+                           Hash Cond: (t3.a = t2.a)
+                           ->  Seq Scan on public.eager_agg_t3 t3
+                                 Output: t3.a, t3.b, t3.c
+                           ->  Hash
+                                 Output: t2.c, t2.b, t2.a
+                                 ->  Seq Scan on public.eager_agg_t2 t2
                                        Output: t2.c, t2.b, t2.a
-                                       ->  Seq Scan on public.eager_agg_t2 t2
-                                             Output: t2.c, t2.b, t2.a
-(25 rows)
+(22 rows)
 
 SELECT t1.a, avg(t2.c + t3.c)
   FROM eager_agg_t1 t1
@@ -170,6 +212,7 @@ GROUP BY t1.a ORDER BY t1.a;
 
 -- Produce results with sorting aggregation
 SET enable_hashagg TO off;
+SET enable_indexagg TO off;
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a, avg(t2.c + t3.c)
   FROM eager_agg_t1 t1
@@ -227,6 +270,62 @@ GROUP BY t1.a ORDER BY t1.a;
 (9 rows)
 
 RESET enable_hashagg;
+RESET enable_indexagg;
+-- Produce results with index aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+  JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Finalize IndexAggregate
+   Output: t1.a, avg((t2.c + t3.c))
+   Group Key: t1.a
+   ->  Hash Join
+         Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+               ->  Partial IndexAggregate
+                     Output: t2.b, PARTIAL avg((t2.c + t3.c))
+                     Group Key: t2.b
+                     ->  Hash Join
+                           Output: t2.c, t2.b, t3.c
+                           Hash Cond: (t3.a = t2.a)
+                           ->  Seq Scan on public.eager_agg_t3 t3
+                                 Output: t3.a, t3.b, t3.c
+                           ->  Hash
+                                 Output: t2.c, t2.b, t2.a
+                                 ->  Seq Scan on public.eager_agg_t2 t2
+                                       Output: t2.c, t2.b, t2.a
+(22 rows)
+
+SELECT t1.a, avg(t2.c + t3.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+  JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg 
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+RESET enable_sort;
 --
 -- Test that eager aggregation works for outer join
 --
@@ -236,27 +335,24 @@ SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
   RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b
 GROUP BY t1.a ORDER BY t1.a;
-                            QUERY PLAN                            
-------------------------------------------------------------------
- Finalize GroupAggregate
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize IndexAggregate
    Output: t1.a, avg(t2.c)
    Group Key: t1.a
-   ->  Sort
+   ->  Hash Right Join
          Output: t1.a, (PARTIAL avg(t2.c))
-         Sort Key: t1.a
-         ->  Hash Right Join
-               Output: t1.a, (PARTIAL avg(t2.c))
-               Hash Cond: (t1.b = t2.b)
-               ->  Seq Scan on public.eager_agg_t1 t1
-                     Output: t1.a, t1.b, t1.c
-               ->  Hash
-                     Output: t2.b, (PARTIAL avg(t2.c))
-                     ->  Partial HashAggregate
-                           Output: t2.b, PARTIAL avg(t2.c)
-                           Group Key: t2.b
-                           ->  Seq Scan on public.eager_agg_t2 t2
-                                 Output: t2.a, t2.b, t2.c
-(18 rows)
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg(t2.c))
+               ->  Partial HashAggregate
+                     Output: t2.b, PARTIAL avg(t2.c)
+                     Group Key: t2.b
+                     ->  Seq Scan on public.eager_agg_t2 t2
+                           Output: t2.a, t2.b, t2.c
+(15 rows)
 
 SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
@@ -331,30 +427,27 @@ SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
   JOIN eager_agg_t2 t2 ON t1.b = t2.b
 GROUP BY t1.a ORDER BY t1.a;
-                                   QUERY PLAN                                    
----------------------------------------------------------------------------------
- Finalize GroupAggregate
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
+ Finalize IndexAggregate
    Output: t1.a, avg(t2.c)
    Group Key: t1.a
-   ->  Gather Merge
+   ->  Gather
          Output: t1.a, (PARTIAL avg(t2.c))
          Workers Planned: 2
-         ->  Sort
+         ->  Parallel Hash Join
                Output: t1.a, (PARTIAL avg(t2.c))
-               Sort Key: t1.a
-               ->  Parallel Hash Join
-                     Output: t1.a, (PARTIAL avg(t2.c))
-                     Hash Cond: (t1.b = t2.b)
-                     ->  Parallel Seq Scan on public.eager_agg_t1 t1
-                           Output: t1.a, t1.b, t1.c
-                     ->  Parallel Hash
-                           Output: t2.b, (PARTIAL avg(t2.c))
-                           ->  Partial HashAggregate
-                                 Output: t2.b, PARTIAL avg(t2.c)
-                                 Group Key: t2.b
-                                 ->  Parallel Seq Scan on public.eager_agg_t2 t2
-                                       Output: t2.a, t2.b, t2.c
-(21 rows)
+               Hash Cond: (t1.b = t2.b)
+               ->  Parallel Seq Scan on public.eager_agg_t1 t1
+                     Output: t1.a, t1.b, t1.c
+               ->  Parallel Hash
+                     Output: t2.b, (PARTIAL avg(t2.c))
+                     ->  Partial HashAggregate
+                           Output: t2.b, PARTIAL avg(t2.c)
+                           Group Key: t2.b
+                           ->  Parallel Seq Scan on public.eager_agg_t2 t2
+                                 Output: t2.a, t2.b, t2.c
+(18 rows)
 
 SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
@@ -387,27 +480,24 @@ SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
   JOIN eager_agg_t2 t2 ON t1.b = t2.b
 GROUP BY t1.a ORDER BY t1.a;
-                            QUERY PLAN                            
-------------------------------------------------------------------
- Finalize GroupAggregate
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize IndexAggregate
    Output: t1.a, avg(t2.c)
    Group Key: t1.a
-   ->  Sort
+   ->  Hash Join
          Output: t1.a, (PARTIAL avg(t2.c))
-         Sort Key: t1.a
-         ->  Hash Join
-               Output: t1.a, (PARTIAL avg(t2.c))
-               Hash Cond: (t1.b = t2.b)
-               ->  Seq Scan on public.eager_agg_t1 t1
-                     Output: t1.a, t1.b, t1.c
-               ->  Hash
-                     Output: t2.b, (PARTIAL avg(t2.c))
-                     ->  Partial HashAggregate
-                           Output: t2.b, PARTIAL avg(t2.c)
-                           Group Key: t2.b
-                           ->  Seq Scan on public.eager_agg_t2 t2
-                                 Output: t2.a, t2.b, t2.c
-(18 rows)
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg(t2.c))
+               ->  Partial HashAggregate
+                     Output: t2.b, PARTIAL avg(t2.c)
+                     Group Key: t2.b
+                     ->  Seq Scan on public.eager_agg_t2 t2
+                           Output: t2.a, t2.b, t2.c
+(15 rows)
 
 SELECT t1.a, avg(t2.c)
   FROM eager_agg_t1 t1
@@ -696,79 +786,77 @@ SELECT t1.x, sum(t2.y + t3.y)
   JOIN eager_agg_tab1 t2 ON t1.x = t2.x
   JOIN eager_agg_tab1 t3 ON t2.x = t3.x
 GROUP BY t1.x ORDER BY t1.x;
-                                        QUERY PLAN                                         
--------------------------------------------------------------------------------------------
- Sort
-   Output: t1.x, (sum((t2.y + t3.y)))
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
+ Merge Append
    Sort Key: t1.x
-   ->  Append
-         ->  Finalize HashAggregate
-               Output: t1.x, sum((t2.y + t3.y))
-               Group Key: t1.x
-               ->  Hash Join
-                     Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
-                     Hash Cond: (t1.x = t2.x)
-                     ->  Seq Scan on public.eager_agg_tab1_p1 t1
-                           Output: t1.x
-                     ->  Hash
-                           Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
-                           ->  Partial HashAggregate
-                                 Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
-                                 Group Key: t2.x
-                                 ->  Hash Join
-                                       Output: t2.y, t2.x, t3.y, t3.x
-                                       Hash Cond: (t2.x = t3.x)
-                                       ->  Seq Scan on public.eager_agg_tab1_p1 t2
-                                             Output: t2.y, t2.x
-                                       ->  Hash
+   ->  Finalize IndexAggregate
+         Output: t1.x, sum((t2.y + t3.y))
+         Group Key: t1.x
+         ->  Hash Join
+               Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+               Hash Cond: (t1.x = t2.x)
+               ->  Seq Scan on public.eager_agg_tab1_p1 t1
+                     Output: t1.x
+               ->  Hash
+                     Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+                     ->  Partial HashAggregate
+                           Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+                           Group Key: t2.x
+                           ->  Hash Join
+                                 Output: t2.y, t2.x, t3.y, t3.x
+                                 Hash Cond: (t2.x = t3.x)
+                                 ->  Seq Scan on public.eager_agg_tab1_p1 t2
+                                       Output: t2.y, t2.x
+                                 ->  Hash
+                                       Output: t3.y, t3.x
+                                       ->  Seq Scan on public.eager_agg_tab1_p1 t3
                                              Output: t3.y, t3.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p1 t3
-                                                   Output: t3.y, t3.x
-         ->  Finalize HashAggregate
-               Output: t1_1.x, sum((t2_1.y + t3_1.y))
-               Group Key: t1_1.x
-               ->  Hash Join
-                     Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
-                     Hash Cond: (t1_1.x = t2_1.x)
-                     ->  Seq Scan on public.eager_agg_tab1_p2 t1_1
-                           Output: t1_1.x
-                     ->  Hash
-                           Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
-                           ->  Partial HashAggregate
-                                 Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
-                                 Group Key: t2_1.x
-                                 ->  Hash Join
-                                       Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
-                                       Hash Cond: (t2_1.x = t3_1.x)
-                                       ->  Seq Scan on public.eager_agg_tab1_p2 t2_1
-                                             Output: t2_1.y, t2_1.x
-                                       ->  Hash
+   ->  Finalize IndexAggregate
+         Output: t1_1.x, sum((t2_1.y + t3_1.y))
+         Group Key: t1_1.x
+         ->  Hash Join
+               Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+               Hash Cond: (t1_1.x = t2_1.x)
+               ->  Seq Scan on public.eager_agg_tab1_p2 t1_1
+                     Output: t1_1.x
+               ->  Hash
+                     Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+                     ->  Partial HashAggregate
+                           Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+                           Group Key: t2_1.x
+                           ->  Hash Join
+                                 Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+                                 Hash Cond: (t2_1.x = t3_1.x)
+                                 ->  Seq Scan on public.eager_agg_tab1_p2 t2_1
+                                       Output: t2_1.y, t2_1.x
+                                 ->  Hash
+                                       Output: t3_1.y, t3_1.x
+                                       ->  Seq Scan on public.eager_agg_tab1_p2 t3_1
                                              Output: t3_1.y, t3_1.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p2 t3_1
-                                                   Output: t3_1.y, t3_1.x
-         ->  Finalize HashAggregate
-               Output: t1_2.x, sum((t2_2.y + t3_2.y))
-               Group Key: t1_2.x
-               ->  Hash Join
-                     Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
-                     Hash Cond: (t1_2.x = t2_2.x)
-                     ->  Seq Scan on public.eager_agg_tab1_p3 t1_2
-                           Output: t1_2.x
-                     ->  Hash
-                           Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
-                           ->  Partial HashAggregate
-                                 Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
-                                 Group Key: t2_2.x
-                                 ->  Hash Join
-                                       Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
-                                       Hash Cond: (t2_2.x = t3_2.x)
-                                       ->  Seq Scan on public.eager_agg_tab1_p3 t2_2
-                                             Output: t2_2.y, t2_2.x
-                                       ->  Hash
+   ->  Finalize IndexAggregate
+         Output: t1_2.x, sum((t2_2.y + t3_2.y))
+         Group Key: t1_2.x
+         ->  Hash Join
+               Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+               Hash Cond: (t1_2.x = t2_2.x)
+               ->  Seq Scan on public.eager_agg_tab1_p3 t1_2
+                     Output: t1_2.x
+               ->  Hash
+                     Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+                     ->  Partial HashAggregate
+                           Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+                           Group Key: t2_2.x
+                           ->  Hash Join
+                                 Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+                                 Hash Cond: (t2_2.x = t3_2.x)
+                                 ->  Seq Scan on public.eager_agg_tab1_p3 t2_2
+                                       Output: t2_2.y, t2_2.x
+                                 ->  Hash
+                                       Output: t3_2.y, t3_2.x
+                                       ->  Seq Scan on public.eager_agg_tab1_p3 t3_2
                                              Output: t3_2.y, t3_2.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p3 t3_2
-                                                   Output: t3_2.y, t3_2.x
-(70 rows)
+(68 rows)
 
 SELECT t1.x, sum(t2.y + t3.y)
   FROM eager_agg_tab1 t1
@@ -803,97 +891,46 @@ SELECT t3.y, sum(t2.y + t3.y)
   JOIN eager_agg_tab1 t2 ON t1.x = t2.x
   JOIN eager_agg_tab1 t3 ON t2.x = t3.x
 GROUP BY t3.y ORDER BY t3.y;
-                                        QUERY PLAN                                         
--------------------------------------------------------------------------------------------
- Finalize GroupAggregate
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
+ Finalize IndexAggregate
    Output: t3.y, sum((t2.y + t3.y))
    Group Key: t3.y
-   ->  Sort
+   ->  Hash Join
          Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
-         Sort Key: t3.y
+         Hash Cond: (t1.x = t2.x)
          ->  Append
-               ->  Hash Join
-                     Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
-                     Hash Cond: (t2.x = t1.x)
-                     ->  Partial GroupAggregate
-                           Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
-                           Group Key: t2.x, t3.y, t3.x
-                           ->  Incremental Sort
-                                 Output: t2.y, t2.x, t3.y, t3.x
-                                 Sort Key: t2.x, t3.y
-                                 Presorted Key: t2.x
-                                 ->  Merge Join
-                                       Output: t2.y, t2.x, t3.y, t3.x
-                                       Merge Cond: (t2.x = t3.x)
-                                       ->  Sort
-                                             Output: t2.y, t2.x
-                                             Sort Key: t2.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p1 t2
-                                                   Output: t2.y, t2.x
-                                       ->  Sort
-                                             Output: t3.y, t3.x
-                                             Sort Key: t3.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p1 t3
-                                                   Output: t3.y, t3.x
-                     ->  Hash
-                           Output: t1.x
-                           ->  Seq Scan on public.eager_agg_tab1_p1 t1
-                                 Output: t1.x
-               ->  Hash Join
-                     Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
-                     Hash Cond: (t2_1.x = t1_1.x)
-                     ->  Partial GroupAggregate
-                           Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
-                           Group Key: t2_1.x, t3_1.y, t3_1.x
-                           ->  Incremental Sort
-                                 Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
-                                 Sort Key: t2_1.x, t3_1.y
-                                 Presorted Key: t2_1.x
-                                 ->  Merge Join
-                                       Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
-                                       Merge Cond: (t2_1.x = t3_1.x)
-                                       ->  Sort
-                                             Output: t2_1.y, t2_1.x
-                                             Sort Key: t2_1.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p2 t2_1
-                                                   Output: t2_1.y, t2_1.x
-                                       ->  Sort
+               ->  Seq Scan on public.eager_agg_tab1_p1 t1_1
+                     Output: t1_1.x
+               ->  Seq Scan on public.eager_agg_tab1_p2 t1_2
+                     Output: t1_2.x
+               ->  Seq Scan on public.eager_agg_tab1_p3 t1_3
+                     Output: t1_3.x
+         ->  Hash
+               Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y)))
+               ->  Partial IndexAggregate
+                     Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+                     Group Key: t2.x, t3.y, t3.x
+                     ->  Hash Join
+                           Output: t2.y, t2.x, t3.y, t3.x
+                           Hash Cond: (t2.x = t3.x)
+                           ->  Append
+                                 ->  Seq Scan on public.eager_agg_tab1_p1 t2_1
+                                       Output: t2_1.y, t2_1.x
+                                 ->  Seq Scan on public.eager_agg_tab1_p2 t2_2
+                                       Output: t2_2.y, t2_2.x
+                                 ->  Seq Scan on public.eager_agg_tab1_p3 t2_3
+                                       Output: t2_3.y, t2_3.x
+                           ->  Hash
+                                 Output: t3.y, t3.x
+                                 ->  Append
+                                       ->  Seq Scan on public.eager_agg_tab1_p1 t3_1
                                              Output: t3_1.y, t3_1.x
-                                             Sort Key: t3_1.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p2 t3_1
-                                                   Output: t3_1.y, t3_1.x
-                     ->  Hash
-                           Output: t1_1.x
-                           ->  Seq Scan on public.eager_agg_tab1_p2 t1_1
-                                 Output: t1_1.x
-               ->  Hash Join
-                     Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
-                     Hash Cond: (t2_2.x = t1_2.x)
-                     ->  Partial GroupAggregate
-                           Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
-                           Group Key: t2_2.x, t3_2.y, t3_2.x
-                           ->  Incremental Sort
-                                 Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
-                                 Sort Key: t2_2.x, t3_2.y
-                                 Presorted Key: t2_2.x
-                                 ->  Merge Join
-                                       Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
-                                       Merge Cond: (t2_2.x = t3_2.x)
-                                       ->  Sort
-                                             Output: t2_2.y, t2_2.x
-                                             Sort Key: t2_2.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p3 t2_2
-                                                   Output: t2_2.y, t2_2.x
-                                       ->  Sort
+                                       ->  Seq Scan on public.eager_agg_tab1_p2 t3_2
                                              Output: t3_2.y, t3_2.x
-                                             Sort Key: t3_2.x
-                                             ->  Seq Scan on public.eager_agg_tab1_p3 t3_2
-                                                   Output: t3_2.y, t3_2.x
-                     ->  Hash
-                           Output: t1_2.x
-                           ->  Seq Scan on public.eager_agg_tab1_p3 t1_2
-                                 Output: t1_2.x
-(88 rows)
+                                       ->  Seq Scan on public.eager_agg_tab1_p3 t3_3
+                                             Output: t3_3.y, t3_3.x
+(37 rows)
 
 SELECT t3.y, sum(t2.y + t3.y)
   FROM eager_agg_tab1 t1
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index d05a0ca0373..57f3af295d1 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2830,6 +2830,7 @@ select count(*) from
 set enable_hashjoin = 0;
 set enable_nestloop = 0;
 set enable_hashagg = 0;
+set enable_indexagg = 0;
 --
 -- Check that we use the pathkeys from a prefix of the group by / order by
 -- clause for the join pathkeys when that prefix covers all join quals.  We
@@ -2857,6 +2858,7 @@ order by x.thousand desc, x.twothousand;
                      ->  Seq Scan on tenk1 x
 (13 rows)
 
+reset enable_indexagg;
 reset enable_hashagg;
 reset enable_nestloop;
 reset enable_hashjoin;
@@ -9537,23 +9539,20 @@ inner join (select distinct id from j3) j3 on j1.id = j3.id;
 explain (verbose, costs off)
 select * from j1
 inner join (select id from j3 group by id) j3 on j1.id = j3.id;
-               QUERY PLAN                
------------------------------------------
+            QUERY PLAN             
+-----------------------------------
  Nested Loop
    Output: j1.id, j3.id
    Inner Unique: true
    Join Filter: (j1.id = j3.id)
-   ->  Group
+   ->  IndexAggregate
          Output: j3.id
          Group Key: j3.id
-         ->  Sort
+         ->  Seq Scan on public.j3
                Output: j3.id
-               Sort Key: j3.id
-               ->  Seq Scan on public.j3
-                     Output: j3.id
    ->  Seq Scan on public.j1
          Output: j1.id
-(14 rows)
+(11 rows)
 
 drop table j1;
 drop table j2;
@@ -9870,16 +9869,14 @@ EXPLAIN (COSTS OFF)
 SELECT 1 FROM group_tbl t1
     LEFT JOIN (SELECT a c1, COALESCE(a, a) c2 FROM group_tbl t2) s ON TRUE
 GROUP BY s.c1, s.c2;
-                   QUERY PLAN                   
-------------------------------------------------
- Group
+                QUERY PLAN                 
+-------------------------------------------
+ IndexAggregate
    Group Key: t2.a, (COALESCE(t2.a, t2.a))
-   ->  Sort
-         Sort Key: t2.a, (COALESCE(t2.a, t2.a))
-         ->  Nested Loop Left Join
-               ->  Seq Scan on group_tbl t1
-               ->  Seq Scan on group_tbl t2
-(7 rows)
+   ->  Nested Loop Left Join
+         ->  Seq Scan on group_tbl t1
+         ->  Seq Scan on group_tbl t2
+(5 rows)
 
 DROP TABLE group_tbl;
 -- Test that we ignore PlaceHolderVars when looking up statistics
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index c30304b99c7..fce941ae1f0 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -150,7 +150,7 @@ EXPLAIN (COSTS OFF)
 SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
              QUERY PLAN             
 ------------------------------------
- HashAggregate
+ IndexAggregate
    Group Key: c
    ->  Result
          Replaces: Scan on pagg_tab
@@ -177,8 +177,9 @@ SELECT c, sum(a) FROM pagg_tab WHERE c = 'x' GROUP BY c;
 ---+-----
 (0 rows)
 
--- Test GroupAggregate paths by disabling hash aggregates.
+-- Test GroupAggregate paths by disabling hash and index aggregates.
 SET enable_hashagg TO false;
+SET enable_indexagg TO false;
 -- When GROUP BY clause matches full aggregation is performed for each partition.
 EXPLAIN (COSTS OFF)
 SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
@@ -370,6 +371,150 @@ SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;
    250
 (1 row)
 
+RESET enable_hashagg;
+RESET enable_indexagg;
+-- Test IndexAggregate paths by disabling hash and group aggregates.
+SET enable_sort TO false;
+SET enable_hashagg TO false;
+-- When GROUP BY clause matches full aggregation is performed for each partition.
+EXPLAIN (COSTS OFF)
+SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Sort
+   Disabled: true
+   Sort Key: pagg_tab.c, (sum(pagg_tab.a)), (avg(pagg_tab.b))
+   ->  Append
+         ->  IndexAggregate
+               Group Key: pagg_tab.c
+               Filter: (avg(pagg_tab.d) < '15'::numeric)
+               ->  Seq Scan on pagg_tab_p1 pagg_tab
+         ->  IndexAggregate
+               Group Key: pagg_tab_1.c
+               Filter: (avg(pagg_tab_1.d) < '15'::numeric)
+               ->  Seq Scan on pagg_tab_p2 pagg_tab_1
+         ->  IndexAggregate
+               Group Key: pagg_tab_2.c
+               Filter: (avg(pagg_tab_2.d) < '15'::numeric)
+               ->  Seq Scan on pagg_tab_p3 pagg_tab_2
+(16 rows)
+
+SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+  c   | sum  |         avg         | count 
+------+------+---------------------+-------
+ 0000 | 2000 | 12.0000000000000000 |   250
+ 0001 | 2250 | 13.0000000000000000 |   250
+ 0002 | 2500 | 14.0000000000000000 |   250
+ 0006 | 2500 | 12.0000000000000000 |   250
+ 0007 | 2750 | 13.0000000000000000 |   250
+ 0008 | 2000 | 14.0000000000000000 |   250
+(6 rows)
+
+-- When GROUP BY clause does not match; top finalize node is required
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Sort
+   Disabled: true
+   Sort Key: pagg_tab.a, (sum(pagg_tab.b)), (avg(pagg_tab.b))
+   ->  Finalize GroupAggregate
+         Group Key: pagg_tab.a
+         Filter: (avg(pagg_tab.d) < '15'::numeric)
+         ->  Merge Append
+               Sort Key: pagg_tab.a
+               ->  Partial IndexAggregate
+                     Group Key: pagg_tab.a
+                     ->  Seq Scan on pagg_tab_p1 pagg_tab
+               ->  Partial IndexAggregate
+                     Group Key: pagg_tab_1.a
+                     ->  Seq Scan on pagg_tab_p2 pagg_tab_1
+               ->  Partial IndexAggregate
+                     Group Key: pagg_tab_2.a
+                     ->  Seq Scan on pagg_tab_p3 pagg_tab_2
+(17 rows)
+
+SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+ a  | sum  |         avg         | count 
+----+------+---------------------+-------
+  0 | 1500 | 10.0000000000000000 |   150
+  1 | 1650 | 11.0000000000000000 |   150
+  2 | 1800 | 12.0000000000000000 |   150
+  3 | 1950 | 13.0000000000000000 |   150
+  4 | 2100 | 14.0000000000000000 |   150
+ 10 | 1500 | 10.0000000000000000 |   150
+ 11 | 1650 | 11.0000000000000000 |   150
+ 12 | 1800 | 12.0000000000000000 |   150
+ 13 | 1950 | 13.0000000000000000 |   150
+ 14 | 2100 | 14.0000000000000000 |   150
+(10 rows)
+
+-- Test partitionwise grouping without any aggregates
+EXPLAIN (COSTS OFF)
+SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
+                   QUERY PLAN                   
+------------------------------------------------
+ Merge Append
+   Sort Key: pagg_tab.c
+   ->  IndexAggregate
+         Group Key: pagg_tab.c
+         ->  Seq Scan on pagg_tab_p1 pagg_tab
+   ->  IndexAggregate
+         Group Key: pagg_tab_1.c
+         ->  Seq Scan on pagg_tab_p2 pagg_tab_1
+   ->  IndexAggregate
+         Group Key: pagg_tab_2.c
+         ->  Seq Scan on pagg_tab_p3 pagg_tab_2
+(11 rows)
+
+SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
+  c   
+------
+ 0000
+ 0001
+ 0002
+ 0003
+ 0004
+ 0005
+ 0006
+ 0007
+ 0008
+ 0009
+ 0010
+ 0011
+(12 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
+                      QUERY PLAN                      
+------------------------------------------------------
+ Group
+   Group Key: pagg_tab.a
+   ->  Merge Append
+         Sort Key: pagg_tab.a
+         ->  Partial IndexAggregate
+               Group Key: pagg_tab.a
+               ->  Seq Scan on pagg_tab_p1 pagg_tab
+                     Filter: (a < 3)
+         ->  Partial IndexAggregate
+               Group Key: pagg_tab_1.a
+               ->  Seq Scan on pagg_tab_p2 pagg_tab_1
+                     Filter: (a < 3)
+         ->  Partial IndexAggregate
+               Group Key: pagg_tab_2.a
+               ->  Seq Scan on pagg_tab_p3 pagg_tab_2
+                     Filter: (a < 3)
+(16 rows)
+
+SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
+ a 
+---
+ 0
+ 1
+ 2
+(3 rows)
+
+RESET enable_sort;
 RESET enable_hashagg;
 -- ROLLUP, partitionwise aggregation does not apply
 EXPLAIN (COSTS OFF)
@@ -554,6 +699,7 @@ SELECT t2.y, sum(t1.y), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2
 -- When GROUP BY clause does not match; partial aggregation is performed for each partition.
 -- Also test GroupAggregate paths by disabling hash aggregates.
 SET enable_hashagg TO false;
+SET enable_indexagg TO false;
 EXPLAIN (COSTS OFF)
 SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y HAVING avg(t1.x) > 10 ORDER BY 1, 2, 3;
                                QUERY PLAN                                
@@ -606,41 +752,40 @@ SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2
 (6 rows)
 
 RESET enable_hashagg;
+RESET enable_indexagg;
 -- Check with LEFT/RIGHT/FULL OUTER JOINs which produces NULL values for
 -- aggregation
 -- LEFT JOIN, should produce partial partitionwise aggregation plan as
 -- GROUP BY is on nullable column
 EXPLAIN (COSTS OFF)
 SELECT b.y, sum(a.y) FROM pagg_tab1 a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY b.y ORDER BY 1 NULLS LAST;
-                            QUERY PLAN                            
-------------------------------------------------------------------
- Finalize GroupAggregate
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize IndexAggregate
    Group Key: b.y
-   ->  Sort
-         Sort Key: b.y
-         ->  Append
-               ->  Partial HashAggregate
-                     Group Key: b.y
-                     ->  Hash Left Join
-                           Hash Cond: (a.x = b.y)
-                           ->  Seq Scan on pagg_tab1_p1 a
-                           ->  Hash
-                                 ->  Seq Scan on pagg_tab2_p1 b
-               ->  Partial HashAggregate
-                     Group Key: b_1.y
-                     ->  Hash Left Join
-                           Hash Cond: (a_1.x = b_1.y)
-                           ->  Seq Scan on pagg_tab1_p2 a_1
-                           ->  Hash
-                                 ->  Seq Scan on pagg_tab2_p2 b_1
-               ->  Partial HashAggregate
-                     Group Key: b_2.y
-                     ->  Hash Right Join
-                           Hash Cond: (b_2.y = a_2.x)
-                           ->  Seq Scan on pagg_tab2_p3 b_2
-                           ->  Hash
-                                 ->  Seq Scan on pagg_tab1_p3 a_2
-(26 rows)
+   ->  Append
+         ->  Partial HashAggregate
+               Group Key: b.y
+               ->  Hash Left Join
+                     Hash Cond: (a.x = b.y)
+                     ->  Seq Scan on pagg_tab1_p1 a
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab2_p1 b
+         ->  Partial HashAggregate
+               Group Key: b_1.y
+               ->  Hash Left Join
+                     Hash Cond: (a_1.x = b_1.y)
+                     ->  Seq Scan on pagg_tab1_p2 a_1
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab2_p2 b_1
+         ->  Partial HashAggregate
+               Group Key: b_2.y
+               ->  Hash Right Join
+                     Hash Cond: (b_2.y = a_2.x)
+                     ->  Seq Scan on pagg_tab2_p3 b_2
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab1_p3 a_2
+(24 rows)
 
 SELECT b.y, sum(a.y) FROM pagg_tab1 a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY b.y ORDER BY 1 NULLS LAST;
  y  | sum  
@@ -704,35 +849,33 @@ SELECT b.y, sum(a.y) FROM pagg_tab1 a RIGHT JOIN pagg_tab2 b ON a.x = b.y GROUP
 -- GROUP BY is on nullable column
 EXPLAIN (COSTS OFF)
 SELECT a.x, sum(b.x) FROM pagg_tab1 a FULL OUTER JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x ORDER BY 1 NULLS LAST;
-                            QUERY PLAN                            
-------------------------------------------------------------------
- Finalize GroupAggregate
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize IndexAggregate
    Group Key: a.x
-   ->  Sort
-         Sort Key: a.x
-         ->  Append
-               ->  Partial HashAggregate
-                     Group Key: a.x
-                     ->  Hash Full Join
-                           Hash Cond: (a.x = b.y)
-                           ->  Seq Scan on pagg_tab1_p1 a
-                           ->  Hash
-                                 ->  Seq Scan on pagg_tab2_p1 b
-               ->  Partial HashAggregate
-                     Group Key: a_1.x
-                     ->  Hash Full Join
-                           Hash Cond: (a_1.x = b_1.y)
-                           ->  Seq Scan on pagg_tab1_p2 a_1
-                           ->  Hash
-                                 ->  Seq Scan on pagg_tab2_p2 b_1
-               ->  Partial HashAggregate
-                     Group Key: a_2.x
-                     ->  Hash Full Join
-                           Hash Cond: (b_2.y = a_2.x)
-                           ->  Seq Scan on pagg_tab2_p3 b_2
-                           ->  Hash
-                                 ->  Seq Scan on pagg_tab1_p3 a_2
-(26 rows)
+   ->  Append
+         ->  Partial HashAggregate
+               Group Key: a.x
+               ->  Hash Full Join
+                     Hash Cond: (a.x = b.y)
+                     ->  Seq Scan on pagg_tab1_p1 a
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab2_p1 b
+         ->  Partial HashAggregate
+               Group Key: a_1.x
+               ->  Hash Full Join
+                     Hash Cond: (a_1.x = b_1.y)
+                     ->  Seq Scan on pagg_tab1_p2 a_1
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab2_p2 b_1
+         ->  Partial HashAggregate
+               Group Key: a_2.x
+               ->  Hash Full Join
+                     Hash Cond: (b_2.y = a_2.x)
+                     ->  Seq Scan on pagg_tab2_p3 b_2
+                     ->  Hash
+                           ->  Seq Scan on pagg_tab1_p3 a_2
+(24 rows)
 
 SELECT a.x, sum(b.x) FROM pagg_tab1 a FULL OUTER JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x ORDER BY 1 NULLS LAST;
  x  | sum  
@@ -839,16 +982,14 @@ SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a FULL JOI
 -- Empty join relation because of empty outer side, no partitionwise agg plan
 EXPLAIN (COSTS OFF)
 SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x, a.y ORDER BY 1, 2;
-                  QUERY PLAN                  
-----------------------------------------------
- GroupAggregate
+               QUERY PLAN               
+----------------------------------------
+ IndexAggregate
    Group Key: pagg_tab1.y
-   ->  Sort
-         Sort Key: pagg_tab1.y
-         ->  Result
-               Replaces: Join on b, pagg_tab1
-               One-Time Filter: false
-(7 rows)
+   ->  Result
+         Replaces: Join on b, pagg_tab1
+         One-Time Filter: false
+(5 rows)
 
 SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY a.x, a.y ORDER BY 1, 2;
  x | y | count 
@@ -869,7 +1010,7 @@ SELECT a, sum(b), avg(c), count(*) FROM pagg_tab_m GROUP BY a HAVING avg(c) < 22
 --------------------------------------------------------------------
  Sort
    Sort Key: pagg_tab_m.a, (sum(pagg_tab_m.b)), (avg(pagg_tab_m.c))
-   ->  Finalize HashAggregate
+   ->  Finalize IndexAggregate
          Group Key: pagg_tab_m.a
          Filter: (avg(pagg_tab_m.c) < '22'::numeric)
          ->  Append
@@ -1067,8 +1208,8 @@ RESET parallel_setup_cost;
 -- PARTITION KEY, thus we will have a partial aggregation for them.
 EXPLAIN (COSTS OFF)
 SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3;
-                                   QUERY PLAN                                    
----------------------------------------------------------------------------------
+                                QUERY PLAN                                 
+---------------------------------------------------------------------------
  Sort
    Sort Key: pagg_tab_ml.a, (sum(pagg_tab_ml.b)), (count(*))
    ->  Append
@@ -1076,31 +1217,27 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER B
                Group Key: pagg_tab_ml.a
                Filter: (avg(pagg_tab_ml.b) < '3'::numeric)
                ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
-         ->  Finalize GroupAggregate
+         ->  Finalize IndexAggregate
                Group Key: pagg_tab_ml_2.a
                Filter: (avg(pagg_tab_ml_2.b) < '3'::numeric)
-               ->  Sort
-                     Sort Key: pagg_tab_ml_2.a
-                     ->  Append
-                           ->  Partial HashAggregate
-                                 Group Key: pagg_tab_ml_2.a
-                                 ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
-                           ->  Partial HashAggregate
-                                 Group Key: pagg_tab_ml_3.a
-                                 ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
-         ->  Finalize GroupAggregate
+               ->  Append
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_ml_2.a
+                           ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_ml_3.a
+                           ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+         ->  Finalize IndexAggregate
                Group Key: pagg_tab_ml_5.a
                Filter: (avg(pagg_tab_ml_5.b) < '3'::numeric)
-               ->  Sort
-                     Sort Key: pagg_tab_ml_5.a
-                     ->  Append
-                           ->  Partial HashAggregate
-                                 Group Key: pagg_tab_ml_5.a
-                                 ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5
-                           ->  Partial HashAggregate
-                                 Group Key: pagg_tab_ml_6.a
-                                 ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6
-(31 rows)
+               ->  Append
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_ml_5.a
+                           ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_ml_6.a
+                           ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6
+(27 rows)
 
 SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3;
  a  | sum  | count 
@@ -1120,31 +1257,29 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING avg(b) < 3 ORDER B
 -- PARTITION KEY
 EXPLAIN (COSTS OFF)
 SELECT b, sum(a), count(*) FROM pagg_tab_ml GROUP BY b ORDER BY 1, 2, 3;
-                                QUERY PLAN                                 
----------------------------------------------------------------------------
+                             QUERY PLAN                              
+---------------------------------------------------------------------
  Sort
    Sort Key: pagg_tab_ml.b, (sum(pagg_tab_ml.a)), (count(*))
-   ->  Finalize GroupAggregate
+   ->  Finalize IndexAggregate
          Group Key: pagg_tab_ml.b
-         ->  Sort
-               Sort Key: pagg_tab_ml.b
-               ->  Append
-                     ->  Partial HashAggregate
-                           Group Key: pagg_tab_ml.b
-                           ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
-                     ->  Partial HashAggregate
-                           Group Key: pagg_tab_ml_1.b
-                           ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_1
-                     ->  Partial HashAggregate
-                           Group Key: pagg_tab_ml_2.b
-                           ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2
-                     ->  Partial HashAggregate
-                           Group Key: pagg_tab_ml_3.b
-                           ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_3
-                     ->  Partial HashAggregate
-                           Group Key: pagg_tab_ml_4.b
-                           ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_4
-(22 rows)
+         ->  Append
+               ->  Partial HashAggregate
+                     Group Key: pagg_tab_ml.b
+                     ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
+               ->  Partial HashAggregate
+                     Group Key: pagg_tab_ml_1.b
+                     ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_1
+               ->  Partial HashAggregate
+                     Group Key: pagg_tab_ml_2.b
+                     ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2
+               ->  Partial HashAggregate
+                     Group Key: pagg_tab_ml_3.b
+                     ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_3
+               ->  Partial HashAggregate
+                     Group Key: pagg_tab_ml_4.b
+                     ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_4
+(20 rows)
 
 SELECT b, sum(a), count(*) FROM pagg_tab_ml GROUP BY b HAVING avg(a) < 15 ORDER BY 1, 2, 3;
  b |  sum  | count 
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 933921d1860..0318863bf1f 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -706,18 +706,16 @@ alter table tenk2 reset (parallel_workers);
 set enable_hashagg = false;
 explain (costs off)
    select count(*) from tenk1 group by twenty;
-                     QUERY PLAN                     
-----------------------------------------------------
+                  QUERY PLAN                  
+----------------------------------------------
  Finalize GroupAggregate
    Group Key: twenty
    ->  Gather Merge
          Workers Planned: 4
-         ->  Partial GroupAggregate
+         ->  Partial IndexAggregate
                Group Key: twenty
-               ->  Sort
-                     Sort Key: twenty
-                     ->  Parallel Seq Scan on tenk1
-(9 rows)
+               ->  Parallel Seq Scan on tenk1
+(7 rows)
 
 select count(*) from tenk1 group by twenty;
  count 
@@ -772,19 +770,17 @@ drop function sp_simple_func(integer);
 -- test handling of SRFs in targetlist (bug in 10.0)
 explain (costs off)
    select count(*), generate_series(1,2) from tenk1 group by twenty;
-                        QUERY PLAN                        
-----------------------------------------------------------
+                     QUERY PLAN                     
+----------------------------------------------------
  ProjectSet
    ->  Finalize GroupAggregate
          Group Key: twenty
          ->  Gather Merge
                Workers Planned: 4
-               ->  Partial GroupAggregate
+               ->  Partial IndexAggregate
                      Group Key: twenty
-                     ->  Sort
-                           Sort Key: twenty
-                           ->  Parallel Seq Scan on tenk1
-(10 rows)
+                     ->  Parallel Seq Scan on tenk1
+(8 rows)
 
 select count(*), generate_series(1,2) from tenk1 group by twenty;
  count | generate_series 
@@ -833,6 +829,7 @@ select count(*), generate_series(1,2) from tenk1 group by twenty;
 
 -- test gather merge with parallel leader participation disabled
 set parallel_leader_participation = off;
+set enable_indexagg = off;
 explain (costs off)
    select count(*) from tenk1 group by twenty;
                      QUERY PLAN                     
@@ -876,6 +873,7 @@ select count(*) from tenk1 group by twenty;
 reset parallel_leader_participation;
 --test rescan behavior of gather merge
 set enable_material = false;
+set enable_indexagg = false;
 explain (costs off)
 select * from
   (select string4, count(unique2)
@@ -917,6 +915,7 @@ select * from
 (12 rows)
 
 reset enable_material;
+reset enable_indexagg;
 reset enable_hashagg;
 -- check parallelized int8 aggregate (bug #14897)
 explain (costs off)
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 0411db832f1..d32bec316d3 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -157,6 +157,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_hashagg                 | on
  enable_hashjoin                | on
  enable_incremental_sort        | on
+ enable_indexagg                | on
  enable_indexonlyscan           | on
  enable_indexscan               | on
  enable_material                | on
@@ -173,7 +174,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(25 rows)
+(26 rows)
 
 -- There are always wait event descriptions for various types.  InjectionPoint
 -- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 850f5a5787f..f72eb367112 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1392,6 +1392,7 @@ CREATE INDEX btg_x_y_idx ON btg(x, y);
 ANALYZE btg;
 
 SET enable_hashagg = off;
+SET enable_indexagg = off;
 SET enable_seqscan = off;
 
 -- Utilize the ordering of index scan to avoid a Sort operation
@@ -1623,12 +1624,100 @@ select v||'a', case v||'a' when 'aa' then 1 else 0 end, count(*)
 select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
   from unnest(array['a','b']) u(v)
  group by v||'a' order by 1;
+ 
+--
+-- Index Aggregation tests
+--
+
+set enable_hashagg = false;
+set enable_sort = false;
+set enable_indexagg = true;
+set enable_indexscan = false;
+
+-- require ordered output
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+SELECT unique1, SUM(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+SELECT even, sum(two) FROM tenk1
+GROUP BY 1
+ORDER BY 1
+LIMIT 10;
+
+-- multiple grouping columns
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+
+SELECT even, odd, sum(unique1) FROM tenk1
+GROUP BY 1, 2
+ORDER BY 1, 2
+LIMIT 10;
+
+-- mixing columns between group by and order by
+begin;
+
+create temp table tmp(x int, y int);
+insert into tmp values (1, 8), (2, 7), (3, 6), (4, 5);
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 1, 2;
+
+EXPLAIN (COSTS OFF, VERBOSE)
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+
+SELECT x, y, sum(x) FROM tmp
+GROUP BY 1, 2
+ORDER BY 2, 1;
+
+--
+-- Index Aggregation Spill tests
+--
+
+set enable_indexagg = true;
+set enable_sort=false;
+set enable_hashagg = false;
+set work_mem='64kB';
+
+select unique1, count(*), sum(twothousand) from tenk1
+group by unique1
+having sum(fivethous) > 4975
+order by sum(twothousand);
+
+set work_mem to default;
+set enable_sort to default;
+set enable_hashagg to default;
+set enable_indexagg to default;
 
 --
 -- Hash Aggregation Spill tests
 --
 
 set enable_sort=false;
+set enable_indexagg = false;
 set work_mem='64kB';
 
 select unique1, count(*), sum(twothousand) from tenk1
@@ -1657,6 +1746,7 @@ analyze agg_data_20k;
 -- Produce results with sorting.
 
 set enable_hashagg = false;
+set enable_indexagg = false;
 
 set jit_above_cost = 0;
 
@@ -1728,23 +1818,68 @@ select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
 set enable_sort = true;
 set work_mem to default;
 
+-- Produce results with index aggregation
+
+set enable_sort = false;
+set enable_hashagg = false;
+set enable_indexagg = true;
+
+set jit_above_cost = 0;
+
+explain (costs off)
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+
+create table agg_index_1 as
+select g%10000 as c1, sum(g::numeric) as c2, count(*) as c3
+  from agg_data_20k group by g%10000;
+
+create table agg_index_2 as
+select * from
+  (values (100), (300), (500)) as r(a),
+  lateral (
+    select (g/2)::numeric as c1,
+           array_agg(g::numeric) as c2,
+	   count(*) as c3
+    from agg_data_2k
+    where g < r.a
+    group by g/2) as s;
+
+set jit_above_cost to default;
+
+create table agg_index_3 as
+select (g/2)::numeric as c1, sum(7::int4) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+
+create table agg_index_4 as
+select (g/2)::numeric as c1, array_agg(g::numeric) as c2, count(*) as c3
+  from agg_data_2k group by g/2;
+
 -- Compare group aggregation results to hash aggregation results
 
 (select * from agg_hash_1 except select * from agg_group_1)
   union all
-(select * from agg_group_1 except select * from agg_hash_1);
+(select * from agg_group_1 except select * from agg_hash_1)
+  union all
+(select * from agg_index_1 except select * from agg_group_1);
 
 (select * from agg_hash_2 except select * from agg_group_2)
   union all
-(select * from agg_group_2 except select * from agg_hash_2);
+(select * from agg_group_2 except select * from agg_hash_2)
+  union all
+(select * from agg_index_2 except select * from agg_group_2);
 
 (select * from agg_hash_3 except select * from agg_group_3)
   union all
-(select * from agg_group_3 except select * from agg_hash_3);
+(select * from agg_group_3 except select * from agg_hash_3)
+  union all
+(select * from agg_index_3 except select * from agg_group_3);
 
 (select * from agg_hash_4 except select * from agg_group_4)
   union all
-(select * from agg_group_4 except select * from agg_hash_4);
+(select * from agg_group_4 except select * from agg_hash_4)
+  union all
+(select * from agg_index_4 except select * from agg_group_4);
 
 drop table agg_group_1;
 drop table agg_group_2;
@@ -1754,3 +1889,7 @@ drop table agg_hash_1;
 drop table agg_hash_2;
 drop table agg_hash_3;
 drop table agg_hash_4;
+drop table agg_index_1;
+drop table agg_index_2;
+drop table agg_index_3;
+drop table agg_index_4;
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
index abe6d6ae09f..f9f4b5dcebd 100644
--- a/src/test/regress/sql/eager_aggregate.sql
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -35,6 +35,7 @@ GROUP BY t1.a ORDER BY t1.a;
 
 -- Produce results with sorting aggregation
 SET enable_hashagg TO off;
+SET enable_indexagg TO off;
 
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a, avg(t2.c)
@@ -48,6 +49,25 @@ SELECT t1.a, avg(t2.c)
 GROUP BY t1.a ORDER BY t1.a;
 
 RESET enable_hashagg;
+RESET enable_indexagg;
+
+-- Produce results with index aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+RESET enable_sort;
 
 
 --
@@ -71,6 +91,7 @@ GROUP BY t1.a ORDER BY t1.a;
 
 -- Produce results with sorting aggregation
 SET enable_hashagg TO off;
+SET enable_indexagg TO off;
 
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a, avg(t2.c + t3.c)
@@ -86,7 +107,27 @@ SELECT t1.a, avg(t2.c + t3.c)
 GROUP BY t1.a ORDER BY t1.a;
 
 RESET enable_hashagg;
+RESET enable_indexagg;
 
+-- Produce results with index aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+  JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c + t3.c)
+  FROM eager_agg_t1 t1
+  JOIN eager_agg_t2 t2 ON t1.b = t2.b
+  JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+RESET enable_sort;
 
 --
 -- Test that eager aggregation works for outer join
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index b91fb7574df..4250b366b1a 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -605,6 +605,7 @@ select count(*) from
 set enable_hashjoin = 0;
 set enable_nestloop = 0;
 set enable_hashagg = 0;
+set enable_indexagg = 0;
 
 --
 -- Check that we use the pathkeys from a prefix of the group by / order by
@@ -617,6 +618,7 @@ from tenk1 x inner join tenk1 y on x.thousand = y.thousand
 group by x.thousand, x.twothousand
 order by x.thousand desc, x.twothousand;
 
+reset enable_indexagg;
 reset enable_hashagg;
 reset enable_nestloop;
 reset enable_hashjoin;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index 7c725e2663a..570aac38fc5 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -55,8 +55,9 @@ EXPLAIN (COSTS OFF)
 SELECT c, sum(a) FROM pagg_tab WHERE c = 'x' GROUP BY c;
 SELECT c, sum(a) FROM pagg_tab WHERE c = 'x' GROUP BY c;
 
--- Test GroupAggregate paths by disabling hash aggregates.
+-- Test GroupAggregate paths by disabling hash and index aggregates.
 SET enable_hashagg TO false;
+SET enable_indexagg TO false;
 
 -- When GROUP BY clause matches full aggregation is performed for each partition.
 EXPLAIN (COSTS OFF)
@@ -81,6 +82,32 @@ EXPLAIN (COSTS OFF)
 SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;
 SELECT count(*) FROM pagg_tab GROUP BY c ORDER BY c LIMIT 1;
 
+RESET enable_hashagg;
+RESET enable_indexagg;
+
+-- Test IndexAggregate paths by disabling hash and group aggregates.
+SET enable_sort TO false;
+SET enable_hashagg TO false;
+
+-- When GROUP BY clause matches full aggregation is performed for each partition.
+EXPLAIN (COSTS OFF)
+SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+SELECT c, sum(a), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+
+-- When GROUP BY clause does not match; top finalize node is required
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+SELECT a, sum(b), avg(b), count(*) FROM pagg_tab GROUP BY 1 HAVING avg(d) < 15 ORDER BY 1, 2, 3;
+
+-- Test partitionwise grouping without any aggregates
+EXPLAIN (COSTS OFF)
+SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
+SELECT c FROM pagg_tab GROUP BY c ORDER BY 1;
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
+SELECT a FROM pagg_tab WHERE a < 3 GROUP BY a ORDER BY 1;
+
+RESET enable_sort;
 RESET enable_hashagg;
 
 -- ROLLUP, partitionwise aggregation does not apply
@@ -135,10 +162,12 @@ SELECT t2.y, sum(t1.y), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2
 -- When GROUP BY clause does not match; partial aggregation is performed for each partition.
 -- Also test GroupAggregate paths by disabling hash aggregates.
 SET enable_hashagg TO false;
+SET enable_indexagg TO false;
 EXPLAIN (COSTS OFF)
 SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y HAVING avg(t1.x) > 10 ORDER BY 1, 2, 3;
 SELECT t1.y, sum(t1.x), count(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y HAVING avg(t1.x) > 10 ORDER BY 1, 2, 3;
 RESET enable_hashagg;
+RESET enable_indexagg;
 
 -- Check with LEFT/RIGHT/FULL OUTER JOINs which produces NULL values for
 -- aggregation
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 71a75bc86ea..5f398219166 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -318,6 +318,7 @@ select count(*), generate_series(1,2) from tenk1 group by twenty;
 
 -- test gather merge with parallel leader participation disabled
 set parallel_leader_participation = off;
+set enable_indexagg = off;
 
 explain (costs off)
    select count(*) from tenk1 group by twenty;
@@ -328,6 +329,7 @@ reset parallel_leader_participation;
 
 --test rescan behavior of gather merge
 set enable_material = false;
+set enable_indexagg = false;
 
 explain (costs off)
 select * from
@@ -341,6 +343,7 @@ select * from
   right join (values (1),(2),(3)) v(x) on true;
 
 reset enable_material;
+reset enable_indexagg;
 
 reset enable_hashagg;
 
-- 
2.43.0

v4-0001-add-in-memory-T-tree-tuple-index.patchtext/x-patch; charset=UTF-8; name=v4-0001-add-in-memory-T-tree-tuple-index.patchDownload

From a7403af67b54cd30fa11662dc5d3f7e15d4a1d2f Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 15:25:41 +0300
Subject: [PATCH v4 1/5] add in-memory T-tree tuple index

This patch implements T tree structure. It will be used as index for special
type of grouping using index.

It supports different memory contexts for tracking memory allocations.
And just like in TupleHashTable during Lookup it uses 'isnew' pointer to
prevent new tuple creation (i.e. when memory limit is reached).

Also it has key abbreviation optimization support like in tuplesort. But
some code was copied and looks exactly the same way, so it is worth
separating such logic into a separate function.

For now it supports only insert operation and no delete, because during
aggregation there are no delete operations.
---
 src/backend/executor/execGrouping.c | 963 ++++++++++++++++++++++++++++
 src/include/executor/executor.h     |  65 ++
 src/include/nodes/execnodes.h       | 148 ++++-
 3 files changed, 1152 insertions(+), 24 deletions(-)

diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
index c107514a85d..3155145a2a8 100644
--- a/src/backend/executor/execGrouping.c
+++ b/src/backend/executor/execGrouping.c
@@ -622,3 +622,966 @@ TupleHashTableMatch(struct tuplehash_hash *tb, MinimalTuple tuple1, MinimalTuple
 	econtext->ecxt_outertuple = slot1;
 	return !ExecQualAndReset(hashtable->cur_eq_func, econtext);
 }
+
+/*****************************************************************************
+ * 		Utility routines for all-in-memory T-Tree
+ * 
+ * These routines build T-tree index for grouping tuples together (eg, for
+ * index aggregation).  There is one entry for each not-distinct set of tuples
+ * presented.
+ *****************************************************************************/
+
+/* 
+ * Representation of searched entry in tuple index. This have
+ * separate representation to avoid necessary memory allocations
+ * to create MinimalTuple for TupleIndexEntry.
+ */
+typedef struct TupleIndexSearchEntryData
+{
+	TupleTableSlot *slot;		/* search TupleTableSlot */
+	Datum	key1;				/* first searched key data */
+	bool	isnull1;			/* first searched key is null */
+} TupleIndexSearchEntryData;
+
+typedef TupleIndexSearchEntryData *TupleIndexSearchEntry;
+
+/* 
+ * compare_index_tuple_tiebreak
+ * 		Perform full comparison of tuples without key abbreviation.
+ * 
+ * Invoked if first key (possibly abbreviated) can not decide comparison, so
+ * we have to compare all keys.
+ */
+static inline int
+compare_index_tuple_tiebreak(TupleIndex index, TupleIndexEntry left,
+							 TupleIndexSearchEntry right)
+{
+	HeapTupleData ltup;
+	SortSupport sortKey = index->sortKeys;
+	TupleDesc tupDesc = index->tupDesc;
+	AttrNumber	attno;
+	Datum		datum1,
+				datum2;
+	bool		isnull1,
+				isnull2;
+	int			cmp;
+
+	ltup.t_len = left->tuple->t_len + MINIMAL_TUPLE_OFFSET;
+	ltup.t_data = (HeapTupleHeader) ((char *) left->tuple - MINIMAL_TUPLE_OFFSET);
+	tupDesc = index->tupDesc;
+
+	if (sortKey->abbrev_converter)
+	{
+		attno = sortKey->ssup_attno;
+
+		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);
+		datum2 = slot_getattr(right->slot, attno, &isnull2);
+
+		cmp = ApplySortAbbrevFullComparator(datum1, isnull1,
+											datum2, isnull2,
+											sortKey);
+		if (cmp != 0)
+			return cmp;
+	}
+
+	sortKey++;
+	for (int nkey = 1; nkey < index->nkeys; nkey++, sortKey++)
+	{
+		attno = sortKey->ssup_attno;
+
+		datum1 = heap_getattr(&ltup, attno, tupDesc, &isnull1);
+		datum2 = slot_getattr(right->slot, attno, &isnull2);
+
+		cmp = ApplySortComparator(datum1, isnull1,
+								  datum2, isnull2,
+								  sortKey);
+		if (cmp != 0)
+			return cmp;
+	}
+
+	return 0;
+}
+
+/* 
+ * compare_index_tuple
+ * 		Compare pair of tuples during index lookup
+ * 
+ * The comparison honors key abbreviation.
+ */
+static int
+compare_index_tuple(TupleIndex index,
+					TupleIndexEntry left,
+					TupleIndexSearchEntry right)
+{
+	SortSupport sortKey = &index->sortKeys[0];
+	int	cmp = 0;
+	
+	cmp = ApplySortComparator(left->key1, left->isnull1,
+							  right->key1, right->isnull1,
+							  sortKey);
+	if (cmp != 0)
+		return cmp;
+
+	return compare_index_tuple_tiebreak(index, left, right);
+}
+
+/* 
+ * tuple_index_node_bsearch
+ * 		Perform binary search in the index node.
+ * 
+ * On return, if 'found' is set to 'true', then exact match found and returned
+ * index is an index in tuples array.  Otherwise the value handled differently:
+ * - for internal nodes this is an index in 'pointers' array which to follow
+ * - for leaf nodes this is an index to which new entry must be inserted.
+ */
+static int
+tuple_index_node_bsearch(TupleIndex index, TupleIndexNode node,
+						 TupleIndexSearchEntry search, bool *found)
+{
+	int low;
+	int high;
+	int cmp;
+
+	/* 
+	 * During tree traversal the main thing is to find bounding node, so
+	 * filtering out ones known not to be bounding will increase performance.
+	 */
+	if (node->ntuples < 2)
+	{
+		if (node->ntuples == 1)
+		{
+			cmp = compare_index_tuple(index, node->tuples[0], search);
+			if (cmp == 0)
+			{
+				*found = true;
+				return 0;
+			}
+
+			*found = false;
+			return cmp < 0 ? 1 : 0;
+		}
+		else
+		{
+			/* can happen only when inserting first entry into whole index */
+			*found = false;
+			return 0;
+		}
+	}
+
+	/* minimum */
+	cmp = compare_index_tuple(index, node->tuples[0], search);
+	if (cmp == 0)
+	{
+		*found = true;
+		return 0;
+	}
+
+	if (cmp > 0)
+	{
+		*found = false;
+		return 0;
+	}
+
+	/* maximum */
+	cmp = compare_index_tuple(index, node->tuples[node->ntuples - 1], search);
+	if (cmp == 0)
+	{
+		*found = true;
+		return node->ntuples - 1;
+	}
+
+	if (cmp < 0)
+	{
+		*found = false;
+		return node->ntuples;
+	}
+
+	/* binary search of middle */
+	low = 1;
+	high = node->ntuples - 1;
+	*found = false;
+
+	while (low < high)
+	{
+		OffsetNumber mid = (low + high) / 2;
+		TupleIndexEntry mid_entry = node->tuples[mid];
+
+		cmp = compare_index_tuple(index, mid_entry, search);
+		if (cmp == 0)
+		{
+			*found = true;
+			return mid;
+		}
+
+		if (cmp < 0)
+			low = mid + 1;
+		else
+			high = mid;
+	}
+
+	return low;
+}
+
+static inline TupleIndexNode
+AllocIndexNode(TupleIndex index)
+{
+	TupleIndexNode node = (TupleIndexNode) MemoryContextAllocZero(
+							index->nodecxt, sizeof(TupleIndexNodeData));
+
+	node->height = 1;	/* initial height */
+	return node;
+}
+
+static inline Datum
+mintup_getattr(MinimalTuple tup, TupleDesc tupdesc, AttrNumber attnum, bool *isnull)
+{
+	HeapTupleData htup;
+
+	htup.t_len = tup->t_len + MINIMAL_TUPLE_OFFSET;
+	htup.t_data = (HeapTupleHeader) ((char *) tup - MINIMAL_TUPLE_OFFSET);
+
+	return heap_getattr(&htup, attnum, tupdesc, isnull);
+}
+
+static inline bool
+index_node_is_leaf(TupleIndexNode node)
+{
+	return node->left == NULL && node->right == NULL;
+}
+
+static inline bool
+index_node_is_half_leaf(TupleIndexNode node)
+{
+	if (node->left == NULL)
+		return node->right != NULL;
+	return node->right == NULL;
+}
+
+static inline int
+index_node_height(TupleIndexNode node)
+{
+	if (node == NULL)
+		return 0;
+
+	return node->height;
+}
+
+static inline int
+index_node_calculate_height(TupleIndexNode node)
+{
+	return Max(index_node_height(node->left),
+			   index_node_height(node->right)) + 1;
+}
+
+static inline int
+index_node_get_balance(TupleIndexNode node)
+{
+	return index_node_height(node->left) - index_node_height(node->right);
+}
+
+/* 
+ * tuple_index_node_check_special_rotation
+ *      When performing LR/RL rotation check corner case to keep the internal
+ *      node full.
+ * 
+ * Ttree has invariant that internal nodes must be completely full. But it has
+ * corner case when parent and it's child are half-leaves and bottom node is
+ * a leaf (newly added) - after performing RL/LR rotation this invariant will
+ * be violated, because old bottom node (with single entry) will become
+ * internal node, but with single element, which violated this invariant.
+ * 
+ * To fix this we forcibly move entries from middle node into bottom and then
+ * perform required rotation.
+ * 
+ * 'lr' is a flag telling which rotation we want to perform and is needed to
+ * transfer entries between nodes correctly.
+ */
+static void
+tuple_index_node_check_special_rotation(TupleIndexNode parent,
+										TupleIndexNode middle,
+										TupleIndexNode bottom,
+										bool lr)
+{
+	/* 
+	 * The paper describes condition in which this case happens - 2 half-leaves
+	 * with leaf at the bottom. But also it has a few conclusions: bottom node
+	 * must have single element and middle node must be full.
+	 * 
+	 * We check only the root cause and for other leave asserts just in order
+	 * to detect invalid usage or bugs.
+	 */
+	if (!(   index_node_is_half_leaf(parent)
+		  && index_node_is_half_leaf(middle) 
+		  && index_node_is_leaf(bottom)))
+		return;
+
+	Assert(bottom->ntuples == 1);
+	Assert(middle->ntuples == TUPLE_INDEX_NODE_MAX_ENTRIES);
+
+	/* move tuples from middle node to bottom */
+	if (lr)
+	{
+		bottom->tuples[TUPLE_INDEX_NODE_MAX_ENTRIES - 1] = bottom->tuples[0];
+		memmove(&bottom->tuples[0], &middle->tuples[1],
+				sizeof(TupleIndexEntry) * (TUPLE_INDEX_NODE_MAX_ENTRIES - 1));
+	}
+	else
+	{
+		memmove(&bottom->tuples[1], &middle->tuples[0],
+				sizeof(TupleIndexEntry) * (TUPLE_INDEX_NODE_MAX_ENTRIES - 1));
+		middle->tuples[0] = middle->tuples[TUPLE_INDEX_NODE_MAX_ENTRIES - 1];
+	}
+
+	bottom->ntuples = TUPLE_INDEX_NODE_MAX_ENTRIES;
+	middle->ntuples = 1;
+}
+
+/* 
+ * tuple_index_node_rotate_left
+ *      Perform left rotation for the tree returning new root
+ */
+static TupleIndexNode
+tuple_index_node_rotate_left(TupleIndexNode p)
+{
+	TupleIndexNode l = p->left;
+	TupleIndexNode lr = l->right;
+
+	l->right = p;
+	p->left = lr;
+
+	p->height = index_node_calculate_height(p);
+	l->height = index_node_calculate_height(l);
+
+	return l;
+}
+
+
+/* 
+ * tuple_index_node_rotate_right
+ *      Perform right rotation for the tree returning new root
+ */
+static TupleIndexNode
+tuple_index_node_rotate_right(TupleIndexNode p)
+{
+	TupleIndexNode r = p->right;
+	TupleIndexNode rl = r->left;
+
+	r->left = p;
+	p->right = rl;
+	
+	p->height = index_node_calculate_height(p);
+	r->height = index_node_calculate_height(r);
+
+	return r;
+}
+
+/* 
+ * tuple_index_insert_fixup
+ *     Check balance of Ttree index node and perform rotations if needed
+ * 
+ * This function is invoked when new node was created and balance was possibly
+ * changed. For this 'checkbalance' variable is passed through call stack and
+ * which must be checked after recursive call ended. It detects imbalance and
+ * if needed performs required rotation.
+ * 
+ * 
+ * Note that according to original paper Ttree has different meaning for letters
+ * in rotation naming. The first letter means which side caused inconsistency,
+ * while in AVL tree it is a direction for rotation, i.e. RR in Ttree is a LL
+ * in AVL-tree.
+ * 
+ * Function returns root for the subtree. It can differ from passed if rotation
+ * was required. We do not store parent pointer, so this function must be called
+ * by parent of the tree.
+ */
+static TupleIndexNode
+tuple_index_insert_fixup(TupleIndex index, TupleIndexNode node)
+{
+	int balance;
+
+	/* 
+	 * update node's height, because this function is invoked right after
+	 * insertion of new node, so height can be changed.
+	 */
+	node->height = index_node_calculate_height(node);
+
+	balance = index_node_get_balance(node);
+
+	/* node is balanced */
+	if (-1 <= balance && balance <= 1)
+		return node;
+
+	if (balance < -1)
+	{
+		balance = index_node_get_balance(node->right);
+		if (balance > 0)
+		{
+			tuple_index_node_check_special_rotation(node, node->right,
+													node->right->left, false);
+			node->right = tuple_index_node_rotate_left(node->right);
+		}
+		node = tuple_index_node_rotate_right(node);
+	}
+	else /* balance > 1 */
+	{
+		balance = index_node_get_balance(node->left);
+		if (balance < 0)
+		{
+			tuple_index_node_check_special_rotation(node, node->left,
+													node->left->right, true);
+			node->left = tuple_index_node_rotate_right(node->left);
+		}
+		node = tuple_index_node_rotate_left(node);
+	}
+
+	return node;
+}
+
+/* 
+ * tuple_index_insert_greatest_lower_bound
+ *     Insert new greatest lower bound into given subtree
+ * 
+ * After deleting the old node minimum, we should make it new greatest lower
+ * bound of left subtree. This is a recursive function that traverse to the
+ * right of a subtree and inserts this entry into to rightmost, maybe creating
+ * new node if old was full.
+ */
+static void
+tuple_index_insert_greatest_lower_bound(TupleIndex index, TupleIndexNode node,
+										TupleIndexEntry entry, int idx,
+										bool *checkbalance)
+{
+	if (node->right == NULL)
+	{
+		/* in-place insertion */
+		if (node->ntuples < TUPLE_INDEX_NODE_MAX_ENTRIES)
+		{
+			node->tuples[node->ntuples] = entry;
+			node->ntuples++;
+			return;
+		}
+
+		node->right = AllocIndexNode(index);
+		node->right->tuples[0] = entry;
+		node->right->ntuples = 1;
+		if (node->left == NULL)
+		{
+			node->height = 2;
+			*checkbalance = true;
+		}
+		else
+		{
+			/* otherwise height of this node can not change */
+		}
+	}
+	else
+	{
+		tuple_index_insert_greatest_lower_bound(index, node->right, entry, idx,
+												checkbalance);
+		if (*checkbalance)
+			node->right = tuple_index_insert_fixup(index, node->right);
+	}
+}
+
+static inline TupleIndexEntry
+tuple_index_create_entry(TupleIndex index, TupleIndexSearchEntry search)
+{
+	MemoryContext oldcxt;
+	TupleIndexEntry entry;
+
+	oldcxt = MemoryContextSwitchTo(index->tuplecxt);
+
+	entry = palloc(sizeof(TupleIndexEntryData));
+	entry->tuple = ExecCopySlotMinimalTupleExtra(search->slot, index->additionalsize);
+
+	MemoryContextSwitchTo(oldcxt);
+
+	/* 
+	 * key1 in search tuple stored in TableTupleSlot which have it's own
+	 * lifetime, so we must not copy it.
+	 * 
+	 * But if key abbreviation is in use than we should copy it from search
+	 * tuple: this is safe (pass-by-value) and extra recalculation can
+	 * spoil statistics calculation.
+	 */
+	if (index->sortKeys->abbrev_converter)
+	{
+		entry->isnull1 = search->isnull1;
+		entry->key1 = search->key1;
+	}
+	else
+	{
+		SortSupport sortKey = &index->sortKeys[0];
+		entry->key1 = mintup_getattr(entry->tuple, index->tupDesc,
+									 sortKey->ssup_attno, &entry->isnull1);
+	}
+
+	return entry;
+}
+
+static TupleIndexEntry
+tuple_index_node_lookup(TupleIndex index, TupleIndexNode node,
+						TupleIndexSearchEntry search, bool *is_new,
+						bool *checkbalance)
+{
+	TupleIndexEntry entry;
+	int idx;
+	bool found;
+
+	bool insert_here;
+	bool have_space;
+	bool is_bounding;
+
+	idx = tuple_index_node_bsearch(index, node, search, &found);
+	if (found)
+	{
+		if (is_new)
+			*is_new = false;
+		return node->tuples[idx];
+	}
+
+	insert_here = false;
+	is_bounding = 0 < idx && idx < node->ntuples;
+	have_space = node->ntuples < TUPLE_INDEX_NODE_MAX_ENTRIES;
+
+	if (is_bounding)
+		/* if node is bounded we must always insert entry here */
+		insert_here = true;
+	else if (have_space &&
+			 ((idx == 0 			&& node->left == NULL) ||
+			  (idx == node->ntuples && node->right == NULL)))
+		/* 
+		 * this node can be not bounded, but if there is not suitable child
+		 * and we have enough space, then just insert entry in this node,
+		 * because it will save us some extra space. in this case the value
+		 * becomes new min/max.
+		 */
+		insert_here = true;
+
+	if (insert_here)
+	{
+		/* no equal entry found, but we are asked not to create new entries */
+		if (is_new == NULL)
+			return NULL;
+
+		entry = tuple_index_create_entry(index, search);
+
+		if (have_space)
+		{
+			/* we have space, so just insert into sorted array */
+			Assert(node->ntuples < TUPLE_INDEX_NODE_MAX_ENTRIES);
+			Assert(0 <= idx && idx <= node->ntuples);
+
+			if (idx < node->ntuples)
+				memmove(&node->tuples[idx + 1], &node->tuples[idx],
+						sizeof(TupleIndexEntry) * (node->ntuples - idx));
+
+			node->tuples[idx] = entry;
+			node->ntuples++;
+		}
+		else
+		{
+			/* 
+			 * If this node is bounding but it does not have free space, then
+			 * we must remove current minimum node and make it new greatest
+			 * lower bound.
+			 */
+			TupleIndexEntry oldmin;
+
+			Assert(node->ntuples == TUPLE_INDEX_NODE_MAX_ENTRIES);
+
+			oldmin = node->tuples[0];
+
+			/* insert new entry into this node */
+			idx--;
+			if (0 < idx)
+				memmove(&node->tuples[0], &node->tuples[1],
+						sizeof(TupleIndexEntry) * idx);
+			node->tuples[idx] = entry;
+
+			/* make old minimum a new greatest lower bound in left subtree */
+			if (node->left == NULL)
+			{
+				node->left = AllocIndexNode(index);
+				node->left->tuples[0] = oldmin;
+				node->left->ntuples = 1;
+
+				/* if right is NULL it means that height was 0, so now became 1 */
+				if (node->right == NULL)
+				{
+					node->height = 2;
+					*checkbalance = true;
+				}
+
+				return entry;
+			}
+			else
+			{
+				/* Search for suitable node and perform insertion */
+				tuple_index_insert_greatest_lower_bound(index, node->left, oldmin, idx,
+														checkbalance);
+				if (*checkbalance)
+					node->left = tuple_index_insert_fixup(index, node->left);
+			}
+		}
+
+		index->ntuples++;
+		*is_new = true;
+	}
+	else
+	{
+		/* non-bounding node - recurse into children */
+		TupleIndexNode *recurse_node;
+
+		Assert(idx == 0 || idx == node->ntuples);
+
+		if (idx == 0)
+			recurse_node = &node->left;
+		else
+			recurse_node = &node->right;
+
+		if (*recurse_node == NULL)
+		{
+			if (!is_new)
+				return NULL;
+
+			*recurse_node = AllocIndexNode(index);
+			node->height = index_node_calculate_height(node);
+			entry = tuple_index_create_entry(index, search);
+			(*recurse_node)->tuples[0] = entry;
+			(*recurse_node)->ntuples = 1;
+
+			*checkbalance = true;
+		}
+		else
+		{
+			entry = tuple_index_node_lookup(index, *recurse_node, search, is_new,
+											checkbalance);
+
+			if (*checkbalance)
+				*recurse_node = tuple_index_insert_fixup(index, *recurse_node);
+		}
+	}
+	return entry;
+}
+
+static void
+remove_index_abbreviations_walker(TupleIndex index, TupleIndexNode node)
+{
+	for (size_t i = 0; i < node->ntuples; i++)
+	{
+		TupleIndexEntry entry = node->tuples[i];
+		entry->key1 = mintup_getattr(entry->tuple, index->tupDesc,
+									 index->sortKeys[0].ssup_attno,
+									 &entry->isnull1);
+	}
+
+	if (node->left)
+		remove_index_abbreviations_walker(index, node->left);
+
+	if (node->right)
+		remove_index_abbreviations_walker(index, node->right);
+
+}
+
+static void
+remove_index_abbreviations(TupleIndex index)
+{
+	SortSupport sortKey = &index->sortKeys[0];
+
+	sortKey->comparator = sortKey->abbrev_full_comparator;
+	sortKey->abbrev_converter = NULL;
+	sortKey->abbrev_abort = NULL;
+	sortKey->abbrev_full_comparator = NULL;
+
+	remove_index_abbreviations_walker(index, index->root);
+}
+
+static inline void
+prepare_search_index_tuple(TupleIndex index, TupleTableSlot *slot,
+						   TupleIndexSearchEntry entry)
+{
+	SortSupport	sortKey;
+
+	sortKey = &index->sortKeys[0];
+
+	entry->slot = slot;
+	entry->key1 = slot_getattr(slot, sortKey->ssup_attno, &entry->isnull1);
+
+	/* NULL can not be abbreviated */
+	if (entry->isnull1)
+		return;
+
+	/* abbreviation is not used */
+	if (!sortKey->abbrev_converter)
+		return;
+
+	/* check if abbreviation should be removed */
+	if (index->abbrevNext <= index->ntuples)
+	{
+		index->abbrevNext *= 2;
+
+		if (sortKey->abbrev_abort(index->ntuples, sortKey))
+		{
+			remove_index_abbreviations(index);
+			return;
+		}
+	}
+
+	entry->key1 = sortKey->abbrev_converter(entry->key1, sortKey);
+}
+
+TupleIndexEntry
+TupleIndexLookup(TupleIndex index, TupleTableSlot *searchslot, bool *is_new)
+{
+	TupleIndexEntry entry;
+	TupleIndexSearchEntryData search_entry;
+	bool checkbalance = false;
+
+	prepare_search_index_tuple(index, searchslot, &search_entry);
+
+	entry = tuple_index_node_lookup(index, index->root, &search_entry, is_new,
+									&checkbalance);
+
+	if (entry == NULL)
+		return NULL;
+
+	if (checkbalance)
+		index->root = tuple_index_insert_fixup(index, index->root);
+
+	return entry;
+}
+
+void
+InitTupleIndexIterator(TupleIndex index, TupleIndexIterator iter)
+{
+	TupleIndexNode min_node;
+
+	/* iterate to the left-most node */
+	min_node = index->root;
+
+	/* 
+	 * in-order traversal requires us to keep track of nodes on our path, so
+	 * we can process them. Also, T-tree has nice property for such traversal -
+	 * all nodes on left are already visited and all on right are not yet. So
+	 * to get to the next node we either go to the leftmost node in right
+	 * subtree (tracking node on the way) or traverse bottom-up to the first
+	 * not visited node.
+	 * 
+	 * To keep track parent nodes we use separate array height-indexed. We
+	 * know that for each height we must have only 1 node, so to get our parent
+	 * we just increment our height. But there is problem - AVL allows slight
+	 * imbalance, so there might be no node on height + 1. Here it's called
+	 * height-gap and handled by setting 'visited' flag, so such entries will
+	 * be skipped, because this array only required during bottom-up traversal.
+	 */
+	iter->max_height = index->root->height;
+	iter->stack = palloc0(sizeof(TupleIndexIteratorNode) * iter->max_height);
+
+	while (min_node->left != NULL)
+	{
+		TupleIndexIteratorNode *n = &iter->stack[min_node->height - 1];
+		n->node = min_node;
+		n->visited = false;
+		if (min_node->left == NULL)
+			break;
+
+		if (min_node->height != min_node->left->height + 1)
+			iter->stack[min_node->left->height].visited = true;
+		min_node = min_node->left;
+	}
+
+ 	iter->cur_node = min_node;
+	iter->cur_idx = 0;
+}
+
+static TupleIndexNode
+tuple_index_iterator_move_next(TupleIndexIterator iter)
+{
+	TupleIndexNode node = iter->cur_node;
+
+	if (node->right)
+	{
+		TupleIndexNode left;
+
+		/* we have right subtree that is not visited yet */
+
+		/* mark current node as already visited */
+		iter->stack[node->height - 1].visited = true;
+
+		/* height-gap */
+		if (node->height != node->right->height + 1)
+			iter->stack[node->right->height].visited = true;
+
+		/* 
+		 * iterate to the left-most node in this tree and mark every node
+		 * on the way as not visited, so we will traverse them later
+		 */
+		left = node->right;
+		while (left->left != NULL)
+		{
+			TupleIndexIteratorNode *n = &iter->stack[left->height - 1];
+			n->visited = false;
+			n->node = left;
+			if (!left->left)
+				break;
+
+			/* height-gap */
+			if (left->height != left->left->height + 1)
+				iter->stack[left->left->height].visited = true;
+			left = left->left;
+		}
+
+		iter->cur_idx = 0;
+		iter->cur_node = left;
+		return left;
+	}
+	else
+	{
+		int height = node->height + 1;
+
+		/* traverse stack higher and find first not yet visited node */
+
+		/* skip already visited nodes */
+		while (height <= iter->max_height && iter->stack[height - 1].visited)
+			height++;
+
+		if (iter->max_height < height)
+			node = NULL;
+		else
+			node = iter->stack[height - 1].node;
+
+		iter->cur_node = node;
+		iter->cur_idx = 0;
+		return node;
+	}
+}
+
+TupleIndexEntry
+TupleIndexIteratorNext(TupleIndexIterator iter)
+{
+	TupleIndexNode node = iter->cur_node;
+	TupleIndexEntry tuple;
+
+	if (node == NULL)
+		return NULL;
+
+	/* this also handles single empty root node case */
+	if (node->ntuples <= iter->cur_idx)
+	{
+		node = tuple_index_iterator_move_next(iter);
+		if (node == NULL)
+			return NULL;
+	}
+
+	tuple = node->tuples[iter->cur_idx];
+	iter->cur_idx++;
+	return tuple;
+}
+
+/* 
+ * Construct an empty TupleIndex
+ *
+ * inputDesc: tuple descriptor for input tuples
+ * nkeys: number of columns to be compared (length of next 4 arrays)
+ * attNums: attribute numbers used for grouping in sort order
+ * sortOperators: Oids of sort operator families used for comparisons
+ * sortCollations: collations used for comparisons
+ * nullsFirstFlags: strategy for handling NULL values
+ * additionalsize: size of data that may be stored along with the index entry
+ *                 used for storing per-trans information during aggregation
+ * metacxt: memory context for TupleIndex itself
+ * tuplecxt: memory context for storing MinimalTuples
+ * nodecxt: memory context for storing index nodes
+ */
+TupleIndex
+BuildTupleIndex(TupleDesc inputDesc,
+				int nkeys,
+				AttrNumber *attNums,
+				Oid *sortOperators,
+				Oid *sortCollations,
+				bool *nullsFirstFlags,
+				Size additionalsize,
+				MemoryContext metacxt,
+				MemoryContext tuplecxt,
+				MemoryContext nodecxt)
+{
+	TupleIndex index;
+	MemoryContext oldcxt;
+
+	Assert(nkeys > 0);
+
+	additionalsize = MAXALIGN(additionalsize);
+
+	oldcxt = MemoryContextSwitchTo(metacxt);
+
+	index = (TupleIndex) palloc(sizeof(TupleIndexData));
+	index->tuplecxt = tuplecxt;
+	index->nodecxt = nodecxt;
+	index->additionalsize = additionalsize;
+	index->tupDesc = CreateTupleDescCopy(inputDesc);
+	index->root = AllocIndexNode(index);
+	index->ntuples = 0;
+	index->height = 0;
+
+	index->nkeys = nkeys;
+	index->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));
+
+	for (int i = 0; i < nkeys; ++i)
+	{
+		SortSupport sortKey = &index->sortKeys[i];
+
+		Assert(AttributeNumberIsValid(attNums[i]));
+		Assert(OidIsValid(sortOperators[i]));
+
+		sortKey->ssup_cxt = CurrentMemoryContext;
+		sortKey->ssup_collation = sortCollations[i];
+		sortKey->ssup_nulls_first = nullsFirstFlags[i];
+		sortKey->ssup_attno = attNums[i];
+		/* abbreviation applies only for the first key */
+		sortKey->abbreviate = i == 0;
+
+		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
+	}
+
+	/* Update abbreviation information */
+	if (index->sortKeys[0].abbrev_converter != NULL)
+	{
+		index->abbrevUsed = true;
+		index->abbrevNext = 10;
+		index->abbrevSortOp = sortOperators[0];
+	}
+	else
+		index->abbrevUsed = false;
+
+	MemoryContextSwitchTo(oldcxt);
+	return index;
+}
+
+/* 
+ * Resets contents of the index to be empty, preserving all the non-content
+ * state.
+ */
+void
+ResetTupleIndex(TupleIndex index)
+{
+	SortSupport ssup;
+
+	/* by this time indexcxt must be reset by the caller */
+	index->root = AllocIndexNode(index);
+	index->height = 0;
+	index->ntuples = 0;
+
+	if (!index->abbrevUsed)
+		return;
+
+	/* 
+	 * If key abbreviation is used then we must reset it's state.
+	 * All fields in SortSupport are already setup, but we should clean
+	 * some fields to make it look just if we setup this for the first time.
+	 */
+	ssup = &index->sortKeys[0];
+	ssup->comparator = NULL;
+	PrepareSortSupportFromOrderingOp(index->abbrevSortOp, ssup);
+}
+
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 5929aabc353..c923ca6d8a9 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -198,6 +198,71 @@ TupleHashEntryGetAdditional(TupleHashTable hashtable, TupleHashEntry entry)
 }
 #endif
 
+extern TupleIndex BuildTupleIndex(TupleDesc inputDesc,
+								  int nkeys,
+								  AttrNumber *attNums,
+								  Oid *sortOperators,
+								  Oid *sortCollations,
+								  bool *nullsFirstFlags,
+								  Size additionalsize,
+								  MemoryContext metacxt,
+								  MemoryContext tablecxt,
+								  MemoryContext nodecxt);
+extern TupleIndexEntry TupleIndexLookup(TupleIndex index, TupleTableSlot *search,
+		  								bool *is_new);
+extern void ResetTupleIndex(TupleIndex index);
+
+/* 
+ * Start iteration over tuples in index. Supports only ascending direction.
+ * During iterations no modifications are allowed, because it can break iterator.
+ */
+extern void	InitTupleIndexIterator(TupleIndex index, TupleIndexIterator iter);
+extern TupleIndexEntry TupleIndexIteratorNext(TupleIndexIterator iter);
+static inline void
+ResetTupleIndexIterator(TupleIndex index, TupleIndexIterator iter)
+{
+	InitTupleIndexIterator(index, iter);
+}
+
+#ifndef FRONTEND
+
+/* 
+ * Return size of the index entry. Useful for estimating memory usage.
+ */
+static inline size_t
+TupleIndexEntrySize(void)
+{
+	return sizeof(TupleIndexEntryData);
+}
+
+/* 
+ * Get a pointer to the additional space allocated for this entry. The
+ * memory will be maxaligned and zeroed.
+ * 
+ * The amount of space available is the additionalsize requested in the call
+ * to BuildTupleIndex(). If additionalsize was specified as zero, return
+ * NULL.
+ */
+static inline void *
+TupleIndexEntryGetAdditional(TupleIndex index, TupleIndexEntry entry)
+{
+	if (index->additionalsize > 0)
+		return (char *) (entry->tuple) - index->additionalsize;
+	else
+		return NULL;
+}
+
+/* 
+ * Return tuple from index entry
+ */
+static inline MinimalTuple
+TupleIndexEntryGetMinimalTuple(TupleIndexEntry entry)
+{
+	return entry->tuple;
+}
+
+#endif
+
 /*
  * prototypes from functions in execJunk.c
  */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 02265456978..c45352a7dc1 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -900,7 +900,94 @@ typedef tuplehash_iterator TupleHashIterator;
 #define ScanTupleHashTable(htable, iter) \
 	tuplehash_iterate(htable->hashtab, iter)
 
+/* ---------------------------------------------------------------
+ * 				Tuple Btree index
+ *
+ * All-in-memory tuple Btree index used for grouping and aggregating.
+ * ---------------------------------------------------------------
+ */
+
+/* 
+ * Representation of tuple in index.  It stores both tuple and
+ * first key information.  If key abbreviation is used, then this
+ * first key stores abbreviated key.
+ */
+typedef struct TupleIndexEntryData
+{
+	MinimalTuple tuple;	/* actual stored tuple */
+	Datum	key1;		/* value of first key */
+	bool	isnull1;	/* first key is null */
+} TupleIndexEntryData;
+
+typedef TupleIndexEntryData *TupleIndexEntry;
+
+/* 
+ * Btree node of tuple index. Common for both internal and leaf nodes.
+ */
+typedef struct TupleIndexNodeData
+{
+	/* left node with keys less than minimum */
+	struct TupleIndexNodeData *left;
+	/* right node with keys greater than maximum */
+	struct TupleIndexNodeData *right;
+	/* amount of tuples in the node */
+	int ntuples;
+	/* height of the node */
+	int height;
+
+/* 
+ * Maximal amount of tuples stored in tuple index node.
+ * 125 is set to fit cache lines completely.
+ */
+#define TUPLE_INDEX_NODE_MAX_ENTRIES  125
 
+	/* 
+	 * array of tuples for this page.
+	 * 
+	 * for internal node these are separator keys.
+	 * for leaf nodes actual tuples.
+	 */
+	TupleIndexEntry tuples[TUPLE_INDEX_NODE_MAX_ENTRIES];
+} TupleIndexNodeData;
+
+typedef TupleIndexNodeData *TupleIndexNode;
+
+typedef struct TupleIndexData
+{
+	TupleDesc	tupDesc;		/* descriptor for stored tuples */
+	TupleIndexNode root;		/* root of the tree */
+	int		height;				/* current tree height */
+	int		ntuples;			/* number of tuples in index */
+	int		nkeys;				/* amount of keys in tuple */
+	SortSupport	sortKeys;		/* support functions for key comparison */
+	MemoryContext	tuplecxt;	/* memory context containing tuples */
+	MemoryContext	nodecxt;	/* memory context containing index nodes */
+	Size	additionalsize;		/* size of additional data for tuple */
+	int		abbrevNext;			/* next time we should check abbreviation 
+									* optimization efficiency */
+	bool	abbrevUsed;			/* true if key abbreviation optimization
+									* was ever used */
+	Oid		abbrevSortOp;		/* sort operator for first key */
+} TupleIndexData;
+
+typedef struct TupleIndexData *TupleIndex;
+
+typedef struct TupleIndexIteratorNode
+{
+	TupleIndexNode node;	/* index node itself */
+	bool visited;			/* was this node visited yet? */
+} TupleIndexIteratorNode;
+
+typedef struct TupleIndexIteratorData
+{
+	TupleIndexIteratorNode	*stack;	/* stack of traversed nodes */
+	int				max_height;	/* max height of tree (root height) */
+	TupleIndexNode	cur_node;	/* current node we are iterating */
+	OffsetNumber	cur_idx;	/* index of tuple in cur_node to return next */
+} TupleIndexIteratorData;
+
+typedef TupleIndexIteratorData *TupleIndexIterator;
+
 /* ----------------------------------------------------------------
  *				 Expression State Nodes
  *
@@ -2529,6 +2616,7 @@ typedef struct AggStatePerTransData *AggStatePerTrans;
 typedef struct AggStatePerGroupData *AggStatePerGroup;
 typedef struct AggStatePerPhaseData *AggStatePerPhase;
 typedef struct AggStatePerHashData *AggStatePerHash;
+typedef struct AggStatePerIndexData *AggStatePerIndex;
 
 typedef struct AggState
 {
@@ -2544,17 +2632,18 @@ typedef struct AggState
 	AggStatePerAgg peragg;		/* per-Aggref information */
 	AggStatePerTrans pertrans;	/* per-Trans state information */
 	ExprContext *hashcontext;	/* econtexts for long-lived data (hashtable) */
+	ExprContext *indexcontext;	/* econtexts for long-lived data (index) */
 	ExprContext **aggcontexts;	/* econtexts for long-lived data (per GS) */
 	ExprContext *tmpcontext;	/* econtext for input expressions */
-#define FIELDNO_AGGSTATE_CURAGGCONTEXT 14
+#define FIELDNO_AGGSTATE_CURAGGCONTEXT 15
 	ExprContext *curaggcontext; /* currently active aggcontext */
 	AggStatePerAgg curperagg;	/* currently active aggregate, if any */
-#define FIELDNO_AGGSTATE_CURPERTRANS 16
+#define FIELDNO_AGGSTATE_CURPERTRANS 17
 	AggStatePerTrans curpertrans;	/* currently active trans state, if any */
 	bool		input_done;		/* indicates end of input */
 	bool		agg_done;		/* indicates completion of Agg scan */
 	int			projected_set;	/* The last projected grouping set */
-#define FIELDNO_AGGSTATE_CURRENT_SET 20
+#define FIELDNO_AGGSTATE_CURRENT_SET 21
 	int			current_set;	/* The current grouping set being evaluated */
 	Bitmapset  *grouped_cols;	/* grouped cols in current projection */
 	List	   *all_grouped_cols;	/* list of all grouped cols in DESC order */
@@ -2576,32 +2665,43 @@ typedef struct AggState
 	int			num_hashes;
 	MemoryContext hash_metacxt; /* memory for hash table bucket array */
 	MemoryContext hash_tuplescxt;	/* memory for hash table tuples */
-	struct LogicalTapeSet *hash_tapeset;	/* tape set for hash spill tapes */
-	struct HashAggSpill *hash_spills;	/* HashAggSpill for each grouping set,
-										 * exists only during first pass */
-	TupleTableSlot *hash_spill_rslot;	/* for reading spill files */
-	TupleTableSlot *hash_spill_wslot;	/* for writing spill files */
-	List	   *hash_batches;	/* hash batches remaining to be processed */
-	bool		hash_ever_spilled;	/* ever spilled during this execution? */
-	bool		hash_spill_mode;	/* we hit a limit during the current batch
-									 * and we must not create new groups */
-	Size		hash_mem_limit; /* limit before spilling hash table */
-	uint64		hash_ngroups_limit; /* limit before spilling hash table */
-	int			hash_planned_partitions;	/* number of partitions planned
-											 * for first pass */
-	double		hashentrysize;	/* estimate revised during execution */
-	Size		hash_mem_peak;	/* peak hash table memory usage */
-	uint64		hash_ngroups_current;	/* number of groups currently in
-										 * memory in all hash tables */
-	uint64		hash_disk_used; /* kB of disk space used */
-	int			hash_batches_used;	/* batches used during entire execution */
-
 	AggStatePerHash perhash;	/* array of per-hashtable data */
 	AggStatePerGroup *hash_pergroup;	/* grouping set indexed array of
 										 * per-group pointers */
+	/* Fields used for managing spill mode in hash and index aggs */
+	struct LogicalTapeSet *spill_tapeset;	/* tape set for hash spill tapes */
+	struct HashAggSpill *spills;	/* HashAggSpill for each grouping set,
+									 * exists only during first pass */
+	TupleTableSlot *spill_rslot;	/* for reading spill files */
+	TupleTableSlot *spill_wslot;	/* for writing spill files */
+	List	   *spill_batches;	/* hash batches remaining to be processed */
+
+	bool		spill_ever_happened;	/* ever spilled during this execution? */
+	bool		spill_mode;	/* we hit a limit during the current batch
+							 * and we must not create new groups */
+	Size		spill_mem_limit; /* limit before spilling hash table or index */
+	uint64		spill_ngroups_limit; /* limit before spilling hash table or index */
+	int			spill_planned_partitions;	/* number of partitions planned
+											 * for first pass */
+	double		hashentrysize;	/* estimate revised during execution */
+	Size		spill_mem_peak;	/* peak memory usage of hash table or index */
+	uint64		spill_ngroups_current;	/* number of groups currently in
+										 * memory in all hash tables */
+	uint64		spill_disk_used; /* kB of disk space used */
+	int			spill_batches_used;	/* batches used during entire execution */
+
+	/* these fields are used in AGG_INDEXED mode: */
+	AggStatePerIndex perindex;	/* pointer to per-index state data */
+	bool			index_filled;	/* index filled yet? */
+	MemoryContext	index_metacxt;	/* memory for index structure */
+	MemoryContext	index_nodecxt;	/* memory for index nodes */
+	MemoryContext	index_entrycxt;	/* memory for index entries */
+	Sort		   *index_sort;		/* ordering information for index */
+	Tuplesortstate *mergestate;		/* state for merging projected tuples if
+									 * spill occurred */
 
 	/* support for evaluation of agg input expressions: */
-#define FIELDNO_AGGSTATE_ALL_PERGROUPS 54
+#define FIELDNO_AGGSTATE_ALL_PERGROUPS 62
 	AggStatePerGroup *all_pergroups;	/* array of first ->pergroups, than
 										 * ->hash_pergroup */
 	SharedAggInfo *shared_info; /* one entry per worker */
-- 
2.43.0

v4-0002-introduce-AGG_INDEX-grouping-strategy-node.patchtext/x-patch; charset=UTF-8; name=v4-0002-introduce-AGG_INDEX-grouping-strategy-node.patchDownload

From 2a790b6755cc365feb2bfe1b6f27587a5ef9a944 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 16:41:58 +0300
Subject: [PATCH v4 2/5] introduce AGG_INDEX grouping strategy node

AGG_INDEX is a new grouping strategy that builds in-memory index and use
it for grouping. The main advantage of this approach is that output is
ordered by grouping columns and if there are any ORDER BY specified,
then it will use this to build grouping/sorting columns.

For index it uses B+tree which was implemented in previous commit. And
overall it's implementation is very close to AGG_HASHED:

- maintain in-memory grouping structure
- track memory consuption
- if memory limit reached spill data to disk in batches (using hash of
  key columns)
- hash batches are processed one after another and for each batch fill
  new in-memory structure

For this reason many code logic is generalized to support both index and
hash implementations: function generalization using boolean arguments
(i.e. 'ishash'), rename spill logic members in AggState with prefix
'spill_' instead of 'hash_', etc.

Most differences are in spill logic: to preserve sort order in case of disk
spill we must dump all indexes to disk to create sorted runs and perform
final external merge.

One problem is external merge. It's adapted from tuplesort.c - introduce
new operational mode - tuplemerge (with it's own prefix). Internally we
just setup state accordingly and process as earlier without any
significant code changes.

Another problem is what tuples to save into sorted runs. We decided to
store tuples after projection (when it's aggregates are finalized),
because internal transition info is represented by value/isnull/novalue
tripple (in AggStatePerGroupData) which is quiet hard to serialize and
handle, but actually, after projection all group by attributes are
saved, so we can access them during merge. Also, projection applies
filter, so it can discard some tuples.
---
 src/backend/executor/execExpr.c            |   31 +-
 src/backend/executor/nodeAgg.c             | 1379 +++++++++++++++++---
 src/backend/utils/sort/tuplesort.c         |  209 ++-
 src/backend/utils/sort/tuplesortvariants.c |  105 ++
 src/include/executor/executor.h            |   10 +-
 src/include/executor/nodeAgg.h             |   33 +-
 src/include/nodes/nodes.h                  |    1 +
 src/include/nodes/plannodes.h              |    2 +-
 src/include/utils/tuplesort.h              |   17 +-
 9 files changed, 1583 insertions(+), 204 deletions(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index e0a1fb76aa8..adf766c9d24 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -94,7 +94,7 @@ static void ExecInitCoerceToDomain(ExprEvalStep *scratch, CoerceToDomain *ctest,
 static void ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 								  ExprEvalStep *scratch,
 								  FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
-								  int transno, int setno, int setoff, bool ishash,
+								  int transno, int setno, int setoff, int strategy,
 								  bool nullcheck);
 static void ExecInitJsonExpr(JsonExpr *jsexpr, ExprState *state,
 							 Datum *resv, bool *resnull,
@@ -3667,7 +3667,7 @@ ExecInitCoerceToDomain(ExprEvalStep *scratch, CoerceToDomain *ctest,
  */
 ExprState *
 ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
-				  bool doSort, bool doHash, bool nullcheck)
+				  int groupStrategy, bool nullcheck)
 {
 	ExprState  *state = makeNode(ExprState);
 	PlanState  *parent = &aggstate->ss.ps;
@@ -3925,7 +3925,7 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 		 * grouping set). Do so for both sort and hash based computations, as
 		 * applicable.
 		 */
-		if (doSort)
+		if (groupStrategy & GROUPING_STRATEGY_SORT)
 		{
 			int			processGroupingSets = Max(phase->numsets, 1);
 			int			setoff = 0;
@@ -3933,13 +3933,13 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 			for (int setno = 0; setno < processGroupingSets; setno++)
 			{
 				ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
-									  pertrans, transno, setno, setoff, false,
-									  nullcheck);
+									  pertrans, transno, setno, setoff,
+									  GROUPING_STRATEGY_SORT, nullcheck);
 				setoff++;
 			}
 		}
 
-		if (doHash)
+		if (groupStrategy & GROUPING_STRATEGY_HASH)
 		{
 			int			numHashes = aggstate->num_hashes;
 			int			setoff;
@@ -3953,12 +3953,19 @@ ExecBuildAggTrans(AggState *aggstate, AggStatePerPhase phase,
 			for (int setno = 0; setno < numHashes; setno++)
 			{
 				ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
-									  pertrans, transno, setno, setoff, true,
-									  nullcheck);
+									  pertrans, transno, setno, setoff,
+									  GROUPING_STRATEGY_HASH, nullcheck);
 				setoff++;
 			}
 		}
 
+		if (groupStrategy & GROUPING_STRATEGY_INDEX)
+		{
+			ExecBuildAggTransCall(state, aggstate, &scratch, trans_fcinfo,
+								  pertrans, transno, 0, 0,
+								  GROUPING_STRATEGY_INDEX, nullcheck);
+		}
+
 		/* adjust early bail out jump target(s) */
 		foreach(bail, adjust_bailout)
 		{
@@ -4011,16 +4018,18 @@ static void
 ExecBuildAggTransCall(ExprState *state, AggState *aggstate,
 					  ExprEvalStep *scratch,
 					  FunctionCallInfo fcinfo, AggStatePerTrans pertrans,
-					  int transno, int setno, int setoff, bool ishash,
+					  int transno, int setno, int setoff, int strategy,
 					  bool nullcheck)
 {
 	ExprContext *aggcontext;
 	int			adjust_jumpnull = -1;
 
-	if (ishash)
+	if (strategy & GROUPING_STRATEGY_HASH)
 		aggcontext = aggstate->hashcontext;
-	else
+	else if (strategy & GROUPING_STRATEGY_SORT)
 		aggcontext = aggstate->aggcontexts[setno];
+	else
+		aggcontext = aggstate->indexcontext;
 
 	/* add check for NULL pointer? */
 	if (nullcheck)
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index baa76596ac2..f165d6f0480 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -364,7 +364,7 @@ typedef struct FindColsContext
 	Bitmapset  *unaggregated;	/* other column references */
 } FindColsContext;
 
-static void select_current_set(AggState *aggstate, int setno, bool is_hash);
+static void select_current_set(AggState *aggstate, int setno, int strategy);
 static void initialize_phase(AggState *aggstate, int newphase);
 static TupleTableSlot *fetch_input_tuple(AggState *aggstate);
 static void initialize_aggregates(AggState *aggstate,
@@ -403,8 +403,8 @@ static void find_cols(AggState *aggstate, Bitmapset **aggregated,
 static bool find_cols_walker(Node *node, FindColsContext *context);
 static void build_hash_tables(AggState *aggstate);
 static void build_hash_table(AggState *aggstate, int setno, double nbuckets);
-static void hashagg_recompile_expressions(AggState *aggstate, bool minslot,
-										  bool nullcheck);
+static void agg_recompile_expressions(AggState *aggstate, bool minslot,
+									  bool nullcheck);
 static void hash_create_memory(AggState *aggstate);
 static double hash_choose_num_buckets(double hashentrysize,
 									  double ngroups, Size memory);
@@ -431,13 +431,13 @@ static HashAggBatch *hashagg_batch_new(LogicalTape *input_tape, int setno,
 									   int64 input_tuples, double input_card,
 									   int used_bits);
 static MinimalTuple hashagg_batch_read(HashAggBatch *batch, uint32 *hashp);
-static void hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset,
-							   int used_bits, double input_groups,
-							   double hashentrysize);
-static Size hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
-								TupleTableSlot *inputslot, uint32 hash);
-static void hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill,
-								 int setno);
+static void agg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset,
+						   int used_bits, double input_groups,
+						   double hashentrysize);
+static Size agg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
+							TupleTableSlot *inputslot, uint32 hash);
+static void agg_spill_finish(AggState *aggstate, HashAggSpill *spill,
+							 int setno);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
 static void build_pertrans_for_aggref(AggStatePerTrans pertrans,
 									  AggState *aggstate, EState *estate,
@@ -446,21 +446,27 @@ static void build_pertrans_for_aggref(AggStatePerTrans pertrans,
 									  Oid aggdeserialfn, Datum initValue,
 									  bool initValueIsNull, Oid *inputTypes,
 									  int numArguments);
-
+static void agg_fill_index(AggState *state);
+static TupleTableSlot *agg_retrieve_index(AggState *state);
+static void lookup_index_entries(AggState *state);
+static void indexagg_finish_initial_spills(AggState *aggstate);
+static void index_agg_enter_spill_mode(AggState *aggstate);
 
 /*
  * Select the current grouping set; affects current_set and
  * curaggcontext.
  */
 static void
-select_current_set(AggState *aggstate, int setno, bool is_hash)
+select_current_set(AggState *aggstate, int setno, int strategy)
 {
 	/*
 	 * When changing this, also adapt ExecAggPlainTransByVal() and
 	 * ExecAggPlainTransByRef().
 	 */
-	if (is_hash)
+	if (strategy == GROUPING_STRATEGY_HASH)
 		aggstate->curaggcontext = aggstate->hashcontext;
+	else if (strategy == GROUPING_STRATEGY_INDEX)
+		aggstate->curaggcontext = aggstate->indexcontext;
 	else
 		aggstate->curaggcontext = aggstate->aggcontexts[setno];
 
@@ -680,7 +686,7 @@ initialize_aggregates(AggState *aggstate,
 	{
 		AggStatePerGroup pergroup = pergroups[setno];
 
-		select_current_set(aggstate, setno, false);
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_SORT);
 
 		for (transno = 0; transno < numTrans; transno++)
 		{
@@ -1478,7 +1484,7 @@ build_hash_tables(AggState *aggstate)
 			continue;
 		}
 
-		memory = aggstate->hash_mem_limit / aggstate->num_hashes;
+		memory = aggstate->spill_mem_limit / aggstate->num_hashes;
 
 		/* choose reasonable number of buckets per hashtable */
 		nbuckets = hash_choose_num_buckets(aggstate->hashentrysize,
@@ -1496,7 +1502,7 @@ build_hash_tables(AggState *aggstate)
 		build_hash_table(aggstate, setno, nbuckets);
 	}
 
-	aggstate->hash_ngroups_current = 0;
+	aggstate->spill_ngroups_current = 0;
 }
 
 /*
@@ -1728,7 +1734,7 @@ hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
 }
 
 /*
- * hashagg_recompile_expressions()
+ * agg_recompile_expressions()
  *
  * Identifies the right phase, compiles the right expression given the
  * arguments, and then sets phase->evalfunc to that expression.
@@ -1746,34 +1752,47 @@ hash_agg_entry_size(int numTrans, Size tupleWidth, Size transitionSpace)
  * expressions in the AggStatePerPhase, and reuse when appropriate.
  */
 static void
-hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
+agg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
 {
 	AggStatePerPhase phase;
 	int			i = minslot ? 1 : 0;
 	int			j = nullcheck ? 1 : 0;
 
 	Assert(aggstate->aggstrategy == AGG_HASHED ||
-		   aggstate->aggstrategy == AGG_MIXED);
+		   aggstate->aggstrategy == AGG_MIXED ||
+		   aggstate->aggstrategy == AGG_INDEX);
 
-	if (aggstate->aggstrategy == AGG_HASHED)
-		phase = &aggstate->phases[0];
-	else						/* AGG_MIXED */
+	if (aggstate->aggstrategy == AGG_MIXED)
 		phase = &aggstate->phases[1];
+	else						/* AGG_HASHED or AGG_INDEX */
+		phase = &aggstate->phases[0];
 
 	if (phase->evaltrans_cache[i][j] == NULL)
 	{
 		const TupleTableSlotOps *outerops = aggstate->ss.ps.outerops;
 		bool		outerfixed = aggstate->ss.ps.outeropsfixed;
-		bool		dohash = true;
-		bool		dosort = false;
+		int			strategy = 0;
 
-		/*
-		 * If minslot is true, that means we are processing a spilled batch
-		 * (inside agg_refill_hash_table()), and we must not advance the
-		 * sorted grouping sets.
-		 */
-		if (aggstate->aggstrategy == AGG_MIXED && !minslot)
-			dosort = true;
+		switch (aggstate->aggstrategy)
+		{
+			case AGG_MIXED:
+				/*
+				 * If minslot is true, that means we are processing a spilled batch
+				 * (inside agg_refill_hash_table()), and we must not advance the
+				 * sorted grouping sets.
+				 */
+				if (!minslot)
+					strategy |= GROUPING_STRATEGY_SORT;
+				/* FALLTHROUGH */
+			case AGG_HASHED:
+				strategy |= GROUPING_STRATEGY_HASH;
+				break;
+			case AGG_INDEX:
+				strategy |= GROUPING_STRATEGY_INDEX;
+				break;	
+			default:
+				Assert(false);
+		}
 
 		/* temporarily change the outerops while compiling the expression */
 		if (minslot)
@@ -1783,8 +1802,7 @@ hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
 		}
 
 		phase->evaltrans_cache[i][j] = ExecBuildAggTrans(aggstate, phase,
-														 dosort, dohash,
-														 nullcheck);
+														 strategy, nullcheck);
 
 		/* change back */
 		aggstate->ss.ps.outerops = outerops;
@@ -1803,9 +1821,9 @@ hashagg_recompile_expressions(AggState *aggstate, bool minslot, bool nullcheck)
  * substantially larger than the initial value.
  */
 void
-hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
-					Size *mem_limit, uint64 *ngroups_limit,
-					int *num_partitions)
+agg_set_limits(double hashentrysize, double input_groups, int used_bits,
+			   Size *mem_limit, uint64 *ngroups_limit,
+			   int *num_partitions)
 {
 	int			npartitions;
 	Size		partition_mem;
@@ -1853,6 +1871,18 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
 		*ngroups_limit = 1;
 }
 
+static inline bool
+agg_spill_required(AggState *aggstate, Size total_mem)
+{
+	/*
+	 * Don't spill unless there's at least one group in the hash table so we
+	 * can be sure to make progress even in edge cases.
+	 */
+	return aggstate->spill_ngroups_current > 0 &&
+			(total_mem > aggstate->spill_mem_limit ||
+			 aggstate->spill_ngroups_current > aggstate->spill_ngroups_limit);
+}
+
 /*
  * hash_agg_check_limits
  *
@@ -1863,7 +1893,6 @@ hash_agg_set_limits(double hashentrysize, double input_groups, int used_bits,
 static void
 hash_agg_check_limits(AggState *aggstate)
 {
-	uint64		ngroups = aggstate->hash_ngroups_current;
 	Size		meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt,
 													 true);
 	Size		entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt,
@@ -1874,7 +1903,7 @@ hash_agg_check_limits(AggState *aggstate)
 	bool		do_spill = false;
 
 #ifdef USE_INJECTION_POINTS
-	if (ngroups >= 1000)
+	if (aggstate->spill_ngroups_current >= 1000)
 	{
 		if (IS_INJECTION_POINT_ATTACHED("hash-aggregate-spill-1000"))
 		{
@@ -1888,9 +1917,7 @@ hash_agg_check_limits(AggState *aggstate)
 	 * Don't spill unless there's at least one group in the hash table so we
 	 * can be sure to make progress even in edge cases.
 	 */
-	if (aggstate->hash_ngroups_current > 0 &&
-		(total_mem > aggstate->hash_mem_limit ||
-		 ngroups > aggstate->hash_ngroups_limit))
+	if (agg_spill_required(aggstate, total_mem))
 	{
 		do_spill = true;
 	}
@@ -1899,68 +1926,150 @@ hash_agg_check_limits(AggState *aggstate)
 		hash_agg_enter_spill_mode(aggstate);
 }
 
+static void
+index_agg_check_limits(AggState *aggstate)
+{
+	Size		meta_mem = MemoryContextMemAllocated(aggstate->index_metacxt,
+													 true);
+	Size		node_mem = MemoryContextMemAllocated(aggstate->index_nodecxt,
+													 true);
+	Size		entry_mem = MemoryContextMemAllocated(aggstate->index_entrycxt,
+													  true);
+	Size		tval_mem = MemoryContextMemAllocated(aggstate->indexcontext->ecxt_per_tuple_memory,
+													 true);
+	Size		total_mem = meta_mem + node_mem + entry_mem + tval_mem;
+	bool		do_spill = false;
+
+#ifdef USE_INJECTION_POINTS
+	if (aggstate->spill_ngroups_current >= 1000)
+	{
+		if (IS_INJECTION_POINT_ATTACHED("index-aggregate-spill-1000"))
+		{
+			do_spill = true;
+			INJECTION_POINT_CACHED("index-aggregate-spill-1000", NULL);
+		}
+	}
+#endif
+
+	if (agg_spill_required(aggstate, total_mem))
+	{
+		do_spill = true;
+	}
+
+	if (do_spill)
+		index_agg_enter_spill_mode(aggstate);
+}
+
 /*
  * Enter "spill mode", meaning that no new groups are added to any of the hash
  * tables. Tuples that would create a new group are instead spilled, and
  * processed later.
  */
-static void
-hash_agg_enter_spill_mode(AggState *aggstate)
+static inline void
+agg_enter_spill_mode(AggState *aggstate, bool ishash)
 {
-	INJECTION_POINT("hash-aggregate-enter-spill-mode", NULL);
-	aggstate->hash_spill_mode = true;
-	hashagg_recompile_expressions(aggstate, aggstate->table_filled, true);
-
-	if (!aggstate->hash_ever_spilled)
+	if (ishash)
 	{
-		Assert(aggstate->hash_tapeset == NULL);
-		Assert(aggstate->hash_spills == NULL);
-
-		aggstate->hash_ever_spilled = true;
-
-		aggstate->hash_tapeset = LogicalTapeSetCreate(true, NULL, -1);
+		INJECTION_POINT("hash-aggregate-enter-spill-mode", NULL);
+		aggstate->spill_mode = true;
+		agg_recompile_expressions(aggstate, aggstate->table_filled, true);	
+	}
+	else
+	{
+		INJECTION_POINT("index-aggregate-enter-spill-mode", NULL);
+		aggstate->spill_mode = true;
+		agg_recompile_expressions(aggstate, aggstate->index_filled, true);
+	}
+
+	if (!aggstate->spill_ever_happened)
+	{
+		Assert(aggstate->spill_tapeset == NULL);
+		Assert(aggstate->spills == NULL);
 
-		aggstate->hash_spills = palloc_array(HashAggSpill, aggstate->num_hashes);
+		aggstate->spill_ever_happened = true;
+		aggstate->spill_tapeset = LogicalTapeSetCreate(true, NULL, -1);
 
-		for (int setno = 0; setno < aggstate->num_hashes; setno++)
+		if (ishash)
 		{
-			AggStatePerHash perhash = &aggstate->perhash[setno];
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
-
-			hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
+			aggstate->spills = palloc_array(HashAggSpill, aggstate->num_hashes);
+
+			for (int setno = 0; setno < aggstate->num_hashes; setno++)
+			{
+				AggStatePerHash perhash = &aggstate->perhash[setno];
+				HashAggSpill *spill = &aggstate->spills[setno];
+
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
 							   perhash->aggnode->numGroups,
 							   aggstate->hashentrysize);
+			}
+		}
+		else
+		{
+			aggstate->spills = palloc(sizeof(HashAggSpill));
+			agg_spill_init(aggstate->spills, aggstate->spill_tapeset, 0,
+						   aggstate->perindex->aggnode->numGroups,
+						   aggstate->hashentrysize);
 		}
 	}
 }
 
+static void
+hash_agg_enter_spill_mode(AggState *aggstate)
+{
+	agg_enter_spill_mode(aggstate, true);
+}
+
+static void
+index_agg_enter_spill_mode(AggState *aggstate)
+{
+	agg_enter_spill_mode(aggstate, false);
+}
+
 /*
  * Update metrics after filling the hash table.
  *
  * If reading from the outer plan, from_tape should be false; if reading from
  * another tape, from_tape should be true.
  */
-static void
-hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+static inline void
+agg_update_spill_metrics(AggState *aggstate, bool from_tape, int npartitions, bool ishash)
 {
 	Size		meta_mem;
 	Size		entry_mem;
-	Size		hashkey_mem;
+	Size		key_mem;
 	Size		buffer_mem;
 	Size		total_mem;
 
 	if (aggstate->aggstrategy != AGG_MIXED &&
-		aggstate->aggstrategy != AGG_HASHED)
+		aggstate->aggstrategy != AGG_HASHED &&
+		aggstate->aggstrategy != AGG_INDEX)
 		return;
 
-	/* memory for the hash table itself */
-	meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt, true);
-
-	/* memory for hash entries */
-	entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt, true);
-
-	/* memory for byref transition states */
-	hashkey_mem = MemoryContextMemAllocated(aggstate->hashcontext->ecxt_per_tuple_memory, true);
+	if (ishash)
+	{
+		/* memory for the hash table itself */
+		meta_mem = MemoryContextMemAllocated(aggstate->hash_metacxt, true);
+
+		/* memory for hash entries */
+		entry_mem = MemoryContextMemAllocated(aggstate->hash_tuplescxt, true);
+
+		/* memory for byref transition states */
+		key_mem = MemoryContextMemAllocated(aggstate->hashcontext->ecxt_per_tuple_memory, true);
+	}
+	else
+	{
+		/* memory for the index itself */
+		meta_mem = MemoryContextMemAllocated(aggstate->index_metacxt, true);
+
+		/* memory for the index nodes */
+		meta_mem += MemoryContextMemAllocated(aggstate->index_nodecxt, true);
+
+		/* memory for index entries */
+		entry_mem = MemoryContextMemAllocated(aggstate->index_entrycxt, true);
+
+		/* memory for byref transition states */
+		key_mem = MemoryContextMemAllocated(aggstate->indexcontext->ecxt_per_tuple_memory, true);
+	}
 
 	/* memory for read/write tape buffers, if spilled */
 	buffer_mem = npartitions * HASHAGG_WRITE_BUFFER_SIZE;
@@ -1968,28 +2077,49 @@ hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
 		buffer_mem += HASHAGG_READ_BUFFER_SIZE;
 
 	/* update peak mem */
-	total_mem = meta_mem + entry_mem + hashkey_mem + buffer_mem;
-	if (total_mem > aggstate->hash_mem_peak)
-		aggstate->hash_mem_peak = total_mem;
+	total_mem = meta_mem + entry_mem + key_mem + buffer_mem;
+	if (total_mem > aggstate->spill_mem_peak)
+		aggstate->spill_mem_peak = total_mem;
 
 	/* update disk usage */
-	if (aggstate->hash_tapeset != NULL)
+	if (aggstate->spill_tapeset != NULL)
 	{
-		uint64		disk_used = LogicalTapeSetBlocks(aggstate->hash_tapeset) * (BLCKSZ / 1024);
+		uint64		disk_used = LogicalTapeSetBlocks(aggstate->spill_tapeset) * (BLCKSZ / 1024);
 
-		if (aggstate->hash_disk_used < disk_used)
-			aggstate->hash_disk_used = disk_used;
+		if (aggstate->spill_disk_used < disk_used)
+			aggstate->spill_disk_used = disk_used;
 	}
 
 	/* update hashentrysize estimate based on contents */
-	if (aggstate->hash_ngroups_current > 0)
+	if (aggstate->spill_ngroups_current > 0)
 	{
-		aggstate->hashentrysize =
-			TupleHashEntrySize() +
-			(hashkey_mem / (double) aggstate->hash_ngroups_current);
+		if (ishash)
+		{
+			aggstate->hashentrysize =
+				TupleHashEntrySize() +
+				(key_mem / (double) aggstate->spill_ngroups_current);
+		}
+		else
+		{
+			/* index stores MinimalTuples directly without any wrapper */
+			aggstate->hashentrysize = 
+				(key_mem / (double) aggstate->spill_ngroups_current);
+		}
 	}
 }
 
+static void
+hash_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+{
+	agg_update_spill_metrics(aggstate, from_tape, npartitions, true);
+}
+
+static void
+index_agg_update_metrics(AggState *aggstate, bool from_tape, int npartitions)
+{
+	agg_update_spill_metrics(aggstate, from_tape, npartitions, false);
+}
+
 /*
  * Create memory contexts used for hash aggregation.
  */
@@ -2048,6 +2178,33 @@ hash_create_memory(AggState *aggstate)
 
 }
 
+/*
+ * Create memory contexts used for index aggregation.
+ */
+static void
+index_create_memory(AggState *aggstate)
+{
+	Size maxBlockSize = ALLOCSET_DEFAULT_MAXSIZE;
+
+	aggstate->indexcontext = CreateWorkExprContext(aggstate->ss.ps.state);
+
+	aggstate->index_metacxt = AllocSetContextCreate(aggstate->ss.ps.state->es_query_cxt,
+													"IndexAgg meta context",
+													ALLOCSET_DEFAULT_SIZES);
+	aggstate->index_nodecxt = BumpContextCreate(aggstate->ss.ps.state->es_query_cxt,
+												"IndexAgg node context",
+												ALLOCSET_SMALL_SIZES);
+
+	maxBlockSize = pg_prevpower2_size_t(work_mem * (Size) 1024 / 16);
+	maxBlockSize = Min(maxBlockSize, ALLOCSET_DEFAULT_MAXSIZE);
+	maxBlockSize = Max(maxBlockSize, ALLOCSET_DEFAULT_INITSIZE);
+	aggstate->index_entrycxt = AllocSetContextCreate(aggstate->ss.ps.state->es_query_cxt,
+												"IndexAgg table context",
+												ALLOCSET_DEFAULT_MINSIZE,
+												ALLOCSET_DEFAULT_INITSIZE,
+												maxBlockSize);
+}
+
 /*
  * Choose a reasonable number of buckets for the initial hash table size.
  */
@@ -2141,7 +2298,7 @@ initialize_hash_entry(AggState *aggstate, TupleHashTable hashtable,
 	AggStatePerGroup pergroup;
 	int			transno;
 
-	aggstate->hash_ngroups_current++;
+	aggstate->spill_ngroups_current++;
 	hash_agg_check_limits(aggstate);
 
 	/* no need to allocate or initialize per-group state */
@@ -2196,9 +2353,9 @@ lookup_hash_entries(AggState *aggstate)
 		bool	   *p_isnew;
 
 		/* if hash table already spilled, don't create new entries */
-		p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
 
-		select_current_set(aggstate, setno, true);
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_HASH);
 		prepare_hash_slot(perhash,
 						  outerslot,
 						  hashslot);
@@ -2214,15 +2371,15 @@ lookup_hash_entries(AggState *aggstate)
 		}
 		else
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 			TupleTableSlot *slot = aggstate->tmpcontext->ecxt_outertuple;
 
 			if (spill->partitions == NULL)
-				hashagg_spill_init(spill, aggstate->hash_tapeset, 0,
-								   perhash->aggnode->numGroups,
-								   aggstate->hashentrysize);
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
+							   perhash->aggnode->numGroups,
+							   aggstate->hashentrysize);
 
-			hashagg_spill_tuple(aggstate, spill, slot, hash);
+			agg_spill_tuple(aggstate, spill, slot, hash);
 			pergroup[setno] = NULL;
 		}
 	}
@@ -2265,6 +2422,12 @@ ExecAgg(PlanState *pstate)
 			case AGG_SORTED:
 				result = agg_retrieve_direct(node);
 				break;
+			case AGG_INDEX:
+				if (!node->index_filled)
+					agg_fill_index(node);
+
+				result = agg_retrieve_index(node);
+				break;
 		}
 
 		if (!TupIsNull(result))
@@ -2381,7 +2544,7 @@ agg_retrieve_direct(AggState *aggstate)
 				aggstate->table_filled = true;
 				ResetTupleHashIterator(aggstate->perhash[0].hashtable,
 									   &aggstate->perhash[0].hashiter);
-				select_current_set(aggstate, 0, true);
+				select_current_set(aggstate, 0, GROUPING_STRATEGY_HASH);
 				return agg_retrieve_hash_table(aggstate);
 			}
 			else
@@ -2601,7 +2764,7 @@ agg_retrieve_direct(AggState *aggstate)
 
 		prepare_projection_slot(aggstate, econtext->ecxt_outertuple, currentSet);
 
-		select_current_set(aggstate, currentSet, false);
+		select_current_set(aggstate, currentSet, GROUPING_STRATEGY_SORT);
 
 		finalize_aggregates(aggstate,
 							peragg,
@@ -2683,19 +2846,19 @@ agg_refill_hash_table(AggState *aggstate)
 	HashAggBatch *batch;
 	AggStatePerHash perhash;
 	HashAggSpill spill;
-	LogicalTapeSet *tapeset = aggstate->hash_tapeset;
+	LogicalTapeSet *tapeset = aggstate->spill_tapeset;
 	bool		spill_initialized = false;
 
-	if (aggstate->hash_batches == NIL)
+	if (aggstate->spill_batches == NIL)
 		return false;
 
 	/* hash_batches is a stack, with the top item at the end of the list */
-	batch = llast(aggstate->hash_batches);
-	aggstate->hash_batches = list_delete_last(aggstate->hash_batches);
+	batch = llast(aggstate->spill_batches);
+	aggstate->spill_batches = list_delete_last(aggstate->spill_batches);
 
-	hash_agg_set_limits(aggstate->hashentrysize, batch->input_card,
-						batch->used_bits, &aggstate->hash_mem_limit,
-						&aggstate->hash_ngroups_limit, NULL);
+	agg_set_limits(aggstate->hashentrysize, batch->input_card,
+				   batch->used_bits, &aggstate->spill_mem_limit,
+				   &aggstate->spill_ngroups_limit, NULL);
 
 	/*
 	 * Each batch only processes one grouping set; set the rest to NULL so
@@ -2712,7 +2875,7 @@ agg_refill_hash_table(AggState *aggstate)
 	for (int setno = 0; setno < aggstate->num_hashes; setno++)
 		ResetTupleHashTable(aggstate->perhash[setno].hashtable);
 
-	aggstate->hash_ngroups_current = 0;
+	aggstate->spill_ngroups_current = 0;
 
 	/*
 	 * In AGG_MIXED mode, hash aggregation happens in phase 1 and the output
@@ -2726,7 +2889,7 @@ agg_refill_hash_table(AggState *aggstate)
 		aggstate->phase = &aggstate->phases[aggstate->current_phase];
 	}
 
-	select_current_set(aggstate, batch->setno, true);
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_HASH);
 
 	perhash = &aggstate->perhash[aggstate->current_set];
 
@@ -2737,19 +2900,19 @@ agg_refill_hash_table(AggState *aggstate)
 	 * We still need the NULL check, because we are only processing one
 	 * grouping set at a time and the rest will be NULL.
 	 */
-	hashagg_recompile_expressions(aggstate, true, true);
+	agg_recompile_expressions(aggstate, true, true);
 
 	INJECTION_POINT("hash-aggregate-process-batch", NULL);
 	for (;;)
 	{
-		TupleTableSlot *spillslot = aggstate->hash_spill_rslot;
+		TupleTableSlot *spillslot = aggstate->spill_rslot;
 		TupleTableSlot *hashslot = perhash->hashslot;
 		TupleHashTable hashtable = perhash->hashtable;
 		TupleHashEntry entry;
 		MinimalTuple tuple;
 		uint32		hash;
 		bool		isnew = false;
-		bool	   *p_isnew = aggstate->hash_spill_mode ? NULL : &isnew;
+		bool	   *p_isnew = aggstate->spill_mode ? NULL : &isnew;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2782,11 +2945,11 @@ agg_refill_hash_table(AggState *aggstate)
 				 * that we don't assign tapes that will never be used.
 				 */
 				spill_initialized = true;
-				hashagg_spill_init(&spill, tapeset, batch->used_bits,
-								   batch->input_card, aggstate->hashentrysize);
+				agg_spill_init(&spill, tapeset, batch->used_bits,
+							   batch->input_card, aggstate->hashentrysize);
 			}
 			/* no memory for a new group, spill */
-			hashagg_spill_tuple(aggstate, &spill, spillslot, hash);
+			agg_spill_tuple(aggstate, &spill, spillslot, hash);
 
 			aggstate->hash_pergroup[batch->setno] = NULL;
 		}
@@ -2806,16 +2969,16 @@ agg_refill_hash_table(AggState *aggstate)
 
 	if (spill_initialized)
 	{
-		hashagg_spill_finish(aggstate, &spill, batch->setno);
+		agg_spill_finish(aggstate, &spill, batch->setno);
 		hash_agg_update_metrics(aggstate, true, spill.npartitions);
 	}
 	else
 		hash_agg_update_metrics(aggstate, true, 0);
 
-	aggstate->hash_spill_mode = false;
+	aggstate->spill_mode = false;
 
 	/* prepare to walk the first hash table */
-	select_current_set(aggstate, batch->setno, true);
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_HASH);
 	ResetTupleHashIterator(aggstate->perhash[batch->setno].hashtable,
 						   &aggstate->perhash[batch->setno].hashiter);
 
@@ -2975,14 +3138,14 @@ agg_retrieve_hash_table_in_memory(AggState *aggstate)
 }
 
 /*
- * hashagg_spill_init
+ * agg_spill_init
  *
  * Called after we determined that spilling is necessary. Chooses the number
  * of partitions to create, and initializes them.
  */
 static void
-hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
-				   double input_groups, double hashentrysize)
+agg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
+			   double input_groups, double hashentrysize)
 {
 	int			npartitions;
 	int			partition_bits;
@@ -3018,14 +3181,13 @@ hashagg_spill_init(HashAggSpill *spill, LogicalTapeSet *tapeset, int used_bits,
 }
 
 /*
- * hashagg_spill_tuple
+ * agg_spill_tuple
  *
- * No room for new groups in the hash table. Save for later in the appropriate
- * partition.
+ * No room for new groups in memory. Save for later in the appropriate partition.
  */
 static Size
-hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
-					TupleTableSlot *inputslot, uint32 hash)
+agg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
+				TupleTableSlot *inputslot, uint32 hash)
 {
 	TupleTableSlot *spillslot;
 	int			partition;
@@ -3039,7 +3201,7 @@ hashagg_spill_tuple(AggState *aggstate, HashAggSpill *spill,
 	/* spill only attributes that we actually need */
 	if (!aggstate->all_cols_needed)
 	{
-		spillslot = aggstate->hash_spill_wslot;
+		spillslot = aggstate->spill_wslot;
 		slot_getsomeattrs(inputslot, aggstate->max_colno_needed);
 		ExecClearTuple(spillslot);
 		for (int i = 0; i < spillslot->tts_tupleDescriptor->natts; i++)
@@ -3167,14 +3329,14 @@ hashagg_finish_initial_spills(AggState *aggstate)
 	int			setno;
 	int			total_npartitions = 0;
 
-	if (aggstate->hash_spills != NULL)
+	if (aggstate->spills != NULL)
 	{
 		for (setno = 0; setno < aggstate->num_hashes; setno++)
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 
 			total_npartitions += spill->npartitions;
-			hashagg_spill_finish(aggstate, spill, setno);
+			agg_spill_finish(aggstate, spill, setno);
 		}
 
 		/*
@@ -3182,21 +3344,21 @@ hashagg_finish_initial_spills(AggState *aggstate)
 		 * processing batches of spilled tuples. The initial spill structures
 		 * are no longer needed.
 		 */
-		pfree(aggstate->hash_spills);
-		aggstate->hash_spills = NULL;
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
 	}
 
 	hash_agg_update_metrics(aggstate, false, total_npartitions);
-	aggstate->hash_spill_mode = false;
+	aggstate->spill_mode = false;
 }
 
 /*
- * hashagg_spill_finish
+ * agg_spill_finish
  *
  * Transform spill partitions into new batches.
  */
 static void
-hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
+agg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
 {
 	int			i;
 	int			used_bits = 32 - spill->shift;
@@ -3223,8 +3385,8 @@ hashagg_spill_finish(AggState *aggstate, HashAggSpill *spill, int setno)
 		new_batch = hashagg_batch_new(tape, setno,
 									  spill->ntuples[i], cardinality,
 									  used_bits);
-		aggstate->hash_batches = lappend(aggstate->hash_batches, new_batch);
-		aggstate->hash_batches_used++;
+		aggstate->spill_batches = lappend(aggstate->spill_batches, new_batch);
+		aggstate->spill_batches_used++;
 	}
 
 	pfree(spill->ntuples);
@@ -3239,33 +3401,668 @@ static void
 hashagg_reset_spill_state(AggState *aggstate)
 {
 	/* free spills from initial pass */
-	if (aggstate->hash_spills != NULL)
+	if (aggstate->spills != NULL)
 	{
 		int			setno;
 
 		for (setno = 0; setno < aggstate->num_hashes; setno++)
 		{
-			HashAggSpill *spill = &aggstate->hash_spills[setno];
+			HashAggSpill *spill = &aggstate->spills[setno];
 
 			pfree(spill->ntuples);
 			pfree(spill->partitions);
 		}
-		pfree(aggstate->hash_spills);
-		aggstate->hash_spills = NULL;
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
+	}
+
+	/* free batches */
+	list_free_deep(aggstate->spill_batches);
+	aggstate->spill_batches = NIL;
+
+	/* close tape set */
+	if (aggstate->spill_tapeset != NULL)
+	{
+		LogicalTapeSetClose(aggstate->spill_tapeset);
+		aggstate->spill_tapeset = NULL;
+	}
+}
+static void
+agg_fill_index(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	ExprContext *tmpcontext = aggstate->tmpcontext;
+
+	/*
+	 * Process each outer-plan tuple, and then fetch the next one, until we
+	 * exhaust the outer plan.
+	 */
+	for (;;)
+	{
+		TupleTableSlot *outerslot;
+
+		outerslot = fetch_input_tuple(aggstate);
+		if (TupIsNull(outerslot))
+			break;
+
+		/* set up for lookup_index_entries and advance_aggregates */
+		tmpcontext->ecxt_outertuple = outerslot;
+
+		/* insert input tuple to index possibly spilling index to disk */
+		lookup_index_entries(aggstate);
+
+		/* Advance the aggregates (or combine functions) */
+		advance_aggregates(aggstate);
+
+		/*
+		 * Reset per-input-tuple context after each tuple, but note that the
+		 * hash lookups do this too
+		 */
+		ResetExprContext(aggstate->tmpcontext);
+	}
+
+	/* 
+	 * Mark that index filled here, so during after recompilation
+	 * expr will expect MinimalTuple instead of outer plan's one type.
+	 */
+	aggstate->index_filled = true;
+
+	indexagg_finish_initial_spills(aggstate);
+
+	/* 
+	 * This is useful only when there is no spill occurred and projecting
+	 * occurs in memory, but still initialize it.
+	 */
+	select_current_set(aggstate, 0, GROUPING_STRATEGY_INDEX);
+	InitTupleIndexIterator(perindex->index, &perindex->iter);
+}
+
+/* 
+ * Extract the attributes that make up the grouping key into the
+ * indexslot. This is necessary to perform comparison in index.
+ */
+static void
+prepare_index_slot(AggStatePerIndex perindex,
+				   TupleTableSlot *inputslot,
+				   TupleTableSlot *indexslot)
+{
+	slot_getsomeattrs(inputslot, perindex->largestGrpColIdx);
+	ExecClearTuple(indexslot);
+
+	for (int i = 0; i < perindex->numCols; ++i)
+	{
+		int		varNumber = perindex->idxKeyColIdxInput[i] - 1;
+		indexslot->tts_values[i] = inputslot->tts_values[varNumber];
+		indexslot->tts_isnull[i] = inputslot->tts_isnull[varNumber];
+	}
+	ExecStoreVirtualTuple(indexslot);
+}
+
+static void
+indexagg_reset_spill_state(AggState *aggstate)
+{
+	/* free spills from initial pass */
+	if (aggstate->spills != NULL)
+	{
+		HashAggSpill *spill = &aggstate->spills[0];
+		pfree(spill->ntuples);
+		pfree(spill->partitions);
+		pfree(aggstate->spills);
+		aggstate->spills = NULL;
 	}
 
 	/* free batches */
-	list_free_deep(aggstate->hash_batches);
-	aggstate->hash_batches = NIL;
+	list_free_deep(aggstate->spill_batches);
+	aggstate->spill_batches = NIL;
 
 	/* close tape set */
-	if (aggstate->hash_tapeset != NULL)
+	if (aggstate->spill_tapeset != NULL)
+	{
+		LogicalTapeSetClose(aggstate->spill_tapeset);
+		aggstate->spill_tapeset = NULL;
+	}
+}
+
+/* 
+ * Initialize a freshly-created MinimalTuple in index
+ */
+static void
+initialize_index_entry(AggState *aggstate, TupleIndex index, TupleIndexEntry entry)
+{
+	AggStatePerGroup pergroup;
+
+	aggstate->spill_ngroups_current++;
+	index_agg_check_limits(aggstate);
+
+	/* no need to allocate or initialize per-group state */
+	if (aggstate->numtrans == 0)
+		return;
+
+	pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(index, entry);
+
+	/* 
+	 * Initialize aggregates for new tuple group, indexagg_lookup_entries()
+	 * already has selected the relevant grouping set.
+	 */
+	for (int transno = 0; transno < aggstate->numtrans; ++transno)
+	{
+		AggStatePerTrans pertrans = &aggstate->pertrans[transno];
+		AggStatePerGroup pergroupstate = &pergroup[transno];
+
+		initialize_aggregate(aggstate, pertrans, pergroupstate);
+	}
+}
+
+/* 
+ * Create new sorted run from current in-memory stored index.
+ */
+static void
+indexagg_save_index_run(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	ExprContext *econtext;
+	TupleIndexIteratorData iter;
+	AggStatePerAgg peragg;
+	TupleTableSlot *firstSlot;
+	TupleIndexEntry entry;
+	TupleTableSlot *indexslot;
+	AggStatePerGroup pergroup;
+
+	econtext = aggstate->ss.ps.ps_ExprContext;
+	firstSlot = aggstate->ss.ss_ScanTupleSlot;
+	peragg = aggstate->peragg;
+	indexslot = perindex->indexslot;
+
+	InitTupleIndexIterator(perindex->index, &iter);
+
+	tuplemerge_start_run(aggstate->mergestate);
+
+	while ((entry = TupleIndexIteratorNext(&iter)) != NULL)
 	{
-		LogicalTapeSetClose(aggstate->hash_tapeset);
-		aggstate->hash_tapeset = NULL;
+		MinimalTuple tuple = TupleIndexEntryGetMinimalTuple(entry);
+		TupleTableSlot *output;
+
+		ResetExprContext(econtext);
+		ExecStoreMinimalTuple(tuple, indexslot, false);
+		slot_getallattrs(indexslot);
+		
+		ExecClearTuple(firstSlot);
+		memset(firstSlot->tts_isnull, true,
+			   firstSlot->tts_tupleDescriptor->natts * sizeof(bool));
+
+		for (int i = 0; i < perindex->numCols; i++)
+		{
+			int varNumber = perindex->idxKeyColIdxInput[i] - 1;
+
+			firstSlot->tts_values[varNumber] = indexslot->tts_values[i];
+			firstSlot->tts_isnull[varNumber] = indexslot->tts_isnull[i];
+		}
+		ExecStoreVirtualTuple(firstSlot);
+
+		pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(perindex->index, entry);
+
+		econtext->ecxt_outertuple = firstSlot;
+		prepare_projection_slot(aggstate,
+								econtext->ecxt_outertuple,
+								aggstate->current_set);
+		finalize_aggregates(aggstate, peragg, pergroup);
+		output = project_aggregates(aggstate);
+		if (output)
+			tuplemerge_puttupleslot(aggstate->mergestate, output);
 	}
+
+	tuplemerge_end_run(aggstate->mergestate);
 }
 
+/* 
+ * Fill in index with tuples in given batch.
+ */
+static void
+indexagg_refill_batch(AggState *aggstate, HashAggBatch *batch)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	TupleTableSlot *spillslot = aggstate->spill_rslot;
+	TupleTableSlot *indexslot = perindex->indexslot;
+	TupleIndex index = perindex->index;
+	LogicalTapeSet *tapeset = aggstate->spill_tapeset;
+	HashAggSpill spill;
+	bool	spill_initialized = false;
+
+	agg_set_limits(aggstate->hashentrysize, batch->input_card, batch->used_bits,
+				   &aggstate->spill_mem_limit, &aggstate->spill_ngroups_limit, NULL);
+
+	ReScanExprContext(aggstate->indexcontext);
+
+	MemoryContextReset(aggstate->index_entrycxt);
+	MemoryContextReset(aggstate->index_nodecxt);
+	ResetTupleIndex(perindex->index);
+
+	aggstate->spill_ngroups_current = 0;
+
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_INDEX);
+
+	agg_recompile_expressions(aggstate, true, true);
+
+	for (;;)
+	{
+		MinimalTuple tuple;
+		TupleIndexEntry entry;
+		bool		isnew = false;
+		bool	   *p_isnew;
+		uint32		hash;
+
+		CHECK_FOR_INTERRUPTS();
+
+		tuple = hashagg_batch_read(batch, &hash);
+		if (tuple == NULL)
+			break;
+
+		ExecStoreMinimalTuple(tuple, spillslot, true);
+		aggstate->tmpcontext->ecxt_outertuple = spillslot;
+
+		prepare_index_slot(perindex, spillslot, indexslot);
+
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
+		entry = TupleIndexLookup(index, indexslot, p_isnew);
+
+		if (entry != NULL)
+		{
+			if (isnew)
+				initialize_index_entry(aggstate, index, entry);
+
+			aggstate->all_pergroups[batch->setno] = TupleIndexEntryGetAdditional(index, entry);
+			advance_aggregates(aggstate);
+		}
+		else
+		{
+			if (!spill_initialized)
+			{
+				spill_initialized = true;
+				agg_spill_init(&spill, tapeset, batch->used_bits,
+							   batch->input_card, aggstate->hashentrysize);
+			}
+
+			agg_spill_tuple(aggstate, &spill, spillslot, hash);
+			aggstate->all_pergroups[batch->setno] = NULL;
+		}
+
+		ResetExprContext(aggstate->tmpcontext);
+	}
+
+	LogicalTapeClose(batch->input_tape);
+
+	if (spill_initialized)
+	{
+		agg_spill_finish(aggstate, &spill, 0);
+		index_agg_update_metrics(aggstate, true, spill.npartitions);
+	}
+	else
+		index_agg_update_metrics(aggstate, true, 0);
+
+	aggstate->spill_mode = false;
+	select_current_set(aggstate, batch->setno, GROUPING_STRATEGY_INDEX);
+
+	pfree(batch);
+}
+
+static void
+indexagg_finish_initial_spills(AggState *aggstate)
+{
+	HashAggSpill *spill;
+	AggStatePerIndex perindex;
+	Sort		 *sort;
+
+	if (!aggstate->spill_ever_happened)
+		return;
+
+	Assert(aggstate->spills != NULL);
+
+	spill = aggstate->spills;
+	agg_spill_finish(aggstate, aggstate->spills, 0);
+
+	index_agg_update_metrics(aggstate, false, spill->npartitions);
+	aggstate->spill_mode = false;
+
+	pfree(aggstate->spills);
+	aggstate->spills = NULL;
+
+	perindex = aggstate->perindex;
+	sort = aggstate->index_sort;
+	aggstate->mergestate = tuplemerge_begin_heap(aggstate->ss.ps.ps_ResultTupleDesc,
+												 perindex->numKeyCols,
+												 perindex->idxKeyColIdxTL,
+												 sort->sortOperators,
+												 sort->collations,
+												 sort->nullsFirst,
+												 work_mem, NULL);
+	/* 
+	 * Some data was spilled.  Index aggregate requires output to be sorted,
+	 * so now we must process all remaining spilled data and produce sorted
+	 * runs for external merge.  The first saved run is current opened index.
+	 */
+	indexagg_save_index_run(aggstate);
+
+	while (aggstate->spill_batches != NIL)
+	{
+		HashAggBatch *batch = llast(aggstate->spill_batches);
+		aggstate->spill_batches = list_delete_last(aggstate->spill_batches);
+
+		indexagg_refill_batch(aggstate, batch);
+		indexagg_save_index_run(aggstate);
+	}
+
+	tuplemerge_performmerge(aggstate->mergestate);
+}
+
+static uint32
+index_calculate_input_slot_hash(AggState *aggstate,
+								TupleTableSlot *inputslot)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	MemoryContext oldcxt;
+	uint32 hash;
+	bool isnull;
+	
+	oldcxt = MemoryContextSwitchTo(aggstate->tmpcontext->ecxt_per_tuple_memory);
+	
+	perindex->exprcontext->ecxt_innertuple = inputslot;
+	hash = DatumGetUInt32(ExecEvalExpr(perindex->indexhashexpr,
+									   perindex->exprcontext,
+									   &isnull));
+
+	MemoryContextSwitchTo(oldcxt);
+
+	return hash;
+}
+
+/* 
+ * indexagg_lookup_entries
+ * 
+ * Insert input tuples to in-memory index.
+ */
+static void
+lookup_index_entries(AggState *aggstate)
+{
+	int numGroupingSets = Max(aggstate->maxsets, 1);
+	AggStatePerGroup *pergroup = aggstate->all_pergroups;
+	TupleTableSlot *outerslot = aggstate->tmpcontext->ecxt_outertuple;
+
+	for (int setno = 0; setno < numGroupingSets; ++setno)
+	{
+		AggStatePerIndex	perindex = &aggstate->perindex[setno];
+		TupleIndex		index = perindex->index;
+		TupleTableSlot *indexslot = perindex->indexslot;
+		TupleIndexEntry	entry;
+		bool			isnew = false;
+		bool		   *p_isnew;
+
+		p_isnew = aggstate->spill_mode ? NULL : &isnew;
+		select_current_set(aggstate, setno, GROUPING_STRATEGY_INDEX);
+
+		prepare_index_slot(perindex, outerslot, indexslot);
+
+		/* Lookup entry in btree */
+		entry = TupleIndexLookup(perindex->index, indexslot, p_isnew);
+
+		/* For now everything is stored in memory - no disk spills */
+		if (entry != NULL)
+		{
+			/* Initialize it's trans state if just created */
+			if (isnew)
+				initialize_index_entry(aggstate, index, entry);
+
+			pergroup[setno] = TupleIndexEntryGetAdditional(index, entry);
+		}
+		else
+		{
+			HashAggSpill *spill = &aggstate->spills[setno];
+			uint32 hash;
+			
+			if (spill->partitions == NULL)
+			{
+				agg_spill_init(spill, aggstate->spill_tapeset, 0,
+							   perindex->aggnode->numGroups,
+							   aggstate->hashentrysize);
+			}
+
+			hash = index_calculate_input_slot_hash(aggstate, indexslot);
+			agg_spill_tuple(aggstate, spill, outerslot, hash);
+			pergroup[setno] = NULL;
+		}
+	}
+}
+
+static TupleTableSlot *
+agg_retrieve_index_in_memory(AggState *aggstate)
+{
+	ExprContext *econtext;
+	TupleTableSlot *firstSlot;
+	AggStatePerIndex perindex;
+	AggStatePerAgg peragg;
+	AggStatePerGroup pergroup;
+	TupleTableSlot *result;
+
+	econtext = aggstate->ss.ps.ps_ExprContext;
+	firstSlot = aggstate->ss.ss_ScanTupleSlot;
+	peragg = aggstate->peragg;
+	perindex = &aggstate->perindex[aggstate->current_set];
+
+	for (;;)
+	{
+		TupleIndexEntry entry;
+		TupleTableSlot *indexslot = perindex->indexslot;
+
+		CHECK_FOR_INTERRUPTS();
+		
+		entry = TupleIndexIteratorNext(&perindex->iter);
+		if (entry == NULL)
+			return NULL;
+
+		ResetExprContext(econtext);
+		ExecStoreMinimalTuple(TupleIndexEntryGetMinimalTuple(entry), indexslot, false);
+		slot_getallattrs(indexslot);
+
+		ExecClearTuple(firstSlot);
+		memset(firstSlot->tts_isnull, true,
+			   firstSlot->tts_tupleDescriptor->natts * sizeof(bool));
+
+		for (int i = 0; i < perindex->numCols; i++)
+		{
+			int varNumber = perindex->idxKeyColIdxInput[i] - 1;
+
+			firstSlot->tts_values[varNumber] = indexslot->tts_values[i];
+			firstSlot->tts_isnull[varNumber] = indexslot->tts_isnull[i];
+		}
+		ExecStoreVirtualTuple(firstSlot);
+
+		pergroup = (AggStatePerGroup) TupleIndexEntryGetAdditional(perindex->index, entry);
+
+		econtext->ecxt_outertuple = firstSlot;
+		prepare_projection_slot(aggstate,
+								econtext->ecxt_outertuple,
+								aggstate->current_set);
+		finalize_aggregates(aggstate, peragg, pergroup);
+		result = project_aggregates(aggstate);
+		if (result)
+			return result;
+	}
+
+	/* no more groups */
+	return NULL;
+}
+
+static TupleTableSlot *
+agg_retrieve_index_merge(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	TupleTableSlot *slot = perindex->mergeslot;
+	TupleTableSlot *resultslot = aggstate->ss.ps.ps_ResultTupleSlot;
+
+	ExecClearTuple(slot);
+
+	if (!tuplesort_gettupleslot(aggstate->mergestate, true, true, slot, NULL))
+		return NULL;
+
+	slot_getallattrs(slot);
+	ExecClearTuple(resultslot);
+
+	for (int i = 0; i < resultslot->tts_tupleDescriptor->natts; ++i)
+	{
+		resultslot->tts_values[i] = slot->tts_values[i];
+		resultslot->tts_isnull[i] = slot->tts_isnull[i];
+	}
+	ExecStoreVirtualTuple(resultslot);
+
+	return resultslot;
+}
+
+static TupleTableSlot *
+agg_retrieve_index(AggState *aggstate)
+{
+	if (aggstate->spill_ever_happened)
+		return agg_retrieve_index_merge(aggstate);
+	else
+		return agg_retrieve_index_in_memory(aggstate);
+}
+
+static void
+build_index(AggState *aggstate)
+{
+	AggStatePerIndex perindex = aggstate->perindex;
+	MemoryContext metacxt = aggstate->index_metacxt;
+	MemoryContext entrycxt = aggstate->index_entrycxt;
+	MemoryContext nodecxt = aggstate->index_nodecxt;
+	MemoryContext oldcxt;
+	Size	additionalsize;
+	Oid	   *eqfuncoids;
+	Sort   *sort;
+
+	Assert(aggstate->aggstrategy == AGG_INDEX);
+
+	additionalsize = aggstate->numtrans * sizeof(AggStatePerGroupData);
+	sort = aggstate->index_sort;
+
+	/* inmem index */
+	perindex->index = BuildTupleIndex(perindex->indexslot->tts_tupleDescriptor,
+									  perindex->numKeyCols,
+									  perindex->idxKeyColIdxIndex,
+									  sort->sortOperators,
+									  sort->collations,
+									  sort->nullsFirst,
+									  additionalsize,
+									  metacxt,
+									  entrycxt,
+									  nodecxt);
+
+	/* disk spill logic */
+	oldcxt = MemoryContextSwitchTo(metacxt);
+	execTuplesHashPrepare(perindex->numKeyCols, perindex->aggnode->grpOperators,
+						  &eqfuncoids, &perindex->hashfunctions);
+	perindex->indexhashexpr =
+		ExecBuildHash32FromAttrs(perindex->indexslot->tts_tupleDescriptor,
+								 perindex->indexslot->tts_ops,
+								 perindex->hashfunctions,
+								 perindex->aggnode->grpCollations,
+								 perindex->numKeyCols,
+								 perindex->idxKeyColIdxIndex,
+								 &aggstate->ss.ps,
+								 0);
+	perindex->exprcontext = CreateStandaloneExprContext();
+	MemoryContextSwitchTo(oldcxt);
+}
+
+static void
+find_index_columns(AggState *aggstate)
+{
+	Bitmapset  *base_colnos;
+	Bitmapset  *aggregated_colnos;
+	TupleDesc	scanDesc = aggstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+	List	   *outerTlist = outerPlanState(aggstate)->plan->targetlist;
+	EState	   *estate = aggstate->ss.ps.state;
+	AggStatePerIndex perindex;
+	Bitmapset  *colnos;
+	AttrNumber *sortColIdx;
+	List	   *indexTlist = NIL;
+	TupleDesc   indexDesc;
+	int			maxCols;
+	int			i;
+
+	find_cols(aggstate, &aggregated_colnos, &base_colnos);
+	aggstate->colnos_needed = bms_union(base_colnos, aggregated_colnos);
+	aggstate->max_colno_needed = 0;
+	aggstate->all_cols_needed = true;
+
+	for (i = 0; i < scanDesc->natts; i++)
+	{
+		int		colno = i + 1;
+
+		if (bms_is_member(colno, aggstate->colnos_needed))
+			aggstate->max_colno_needed = colno;
+		else
+			aggstate->all_cols_needed = false;
+	}
+
+	perindex = aggstate->perindex;
+	colnos = bms_copy(base_colnos);
+
+	if (aggstate->phases[0].grouped_cols)
+	{
+		Bitmapset *grouped_cols = aggstate->phases[0].grouped_cols[0];
+		ListCell  *lc;
+		foreach(lc, aggstate->all_grouped_cols)
+		{
+			int attnum = lfirst_int(lc);
+			if (!bms_is_member(attnum, grouped_cols))
+				colnos = bms_del_member(colnos, attnum);
+		}
+	}
+
+	maxCols = bms_num_members(colnos) + perindex->numKeyCols;
+
+	perindex->idxKeyColIdxInput = palloc(maxCols * sizeof(AttrNumber));
+	perindex->idxKeyColIdxIndex = palloc(perindex->numKeyCols * sizeof(AttrNumber));
+
+	/* Add all the sorting/grouping columns to colnos */
+	sortColIdx = aggstate->index_sort->sortColIdx;
+	for (i = 0; i < perindex->numKeyCols; i++)
+		colnos = bms_add_member(colnos, sortColIdx[i]);
+
+	for (i = 0; i < perindex->numKeyCols; i++)
+	{
+		perindex->idxKeyColIdxInput[i] = sortColIdx[i];
+		perindex->idxKeyColIdxIndex[i] = i + 1;
+
+		perindex->numCols++;
+		/* delete already mapped columns */
+		colnos = bms_del_member(colnos, sortColIdx[i]);
+	}
+
+	/* and the remainig columns */
+	i = -1;
+	while ((i = bms_next_member(colnos, i)) >= 0)
+	{
+		perindex->idxKeyColIdxInput[perindex->numCols] = i;
+		perindex->numCols++;
+	}
+
+	/* build tuple descriptor for the index */
+	perindex->largestGrpColIdx = 0;
+	for (i = 0; i < perindex->numCols; i++)
+	{
+		int		varNumber = perindex->idxKeyColIdxInput[i] - 1;
+
+		indexTlist = lappend(indexTlist, list_nth(outerTlist, varNumber));
+		perindex->largestGrpColIdx = Max(varNumber + 1, perindex->largestGrpColIdx);
+	}
+
+	indexDesc = ExecTypeFromTL(indexTlist);
+	perindex->indexslot = ExecAllocTableSlot(&estate->es_tupleTable, indexDesc,
+										   &TTSOpsMinimalTuple);
+	list_free(indexTlist);
+	bms_free(colnos);
+
+	bms_free(base_colnos);
+}
 
 /* -----------------
  * ExecInitAgg
@@ -3297,10 +4094,12 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	int			numGroupingSets = 1;
 	int			numPhases;
 	int			numHashes;
+	int			numIndexes;
 	int			i = 0;
 	int			j = 0;
 	bool		use_hashing = (node->aggstrategy == AGG_HASHED ||
 							   node->aggstrategy == AGG_MIXED);
+	bool		use_index = (node->aggstrategy == AGG_INDEX);
 
 	/* check for unsupported flags */
 	Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -3337,6 +4136,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	 */
 	numPhases = (use_hashing ? 1 : 2);
 	numHashes = (use_hashing ? 1 : 0);
+	numIndexes = (use_index ? 1 : 0);
 
 	/*
 	 * Calculate the maximum number of grouping sets in any phase; this
@@ -3356,7 +4156,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 
 			/*
 			 * additional AGG_HASHED aggs become part of phase 0, but all
-			 * others add an extra phase.
+			 * others add an extra phase.  AGG_INDEX does not support grouping
+			 * sets, so else branch must be AGG_SORTED or AGG_MIXED.
 			 */
 			if (agg->aggstrategy != AGG_HASHED)
 				++numPhases;
@@ -3395,6 +4196,8 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 
 	if (use_hashing)
 		hash_create_memory(aggstate);
+	else if (use_index)
+		index_create_memory(aggstate);
 
 	ExecAssignExprContext(estate, &aggstate->ss.ps);
 
@@ -3501,6 +4304,13 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		aggstate->phases[0].gset_lengths = palloc_array(int, numHashes);
 		aggstate->phases[0].grouped_cols = palloc_array(Bitmapset *, numHashes);
 	}
+	else if (numIndexes)
+	{
+		aggstate->perindex = palloc0(sizeof(AggStatePerIndexData) * numIndexes);
+		aggstate->phases[0].numsets = 0;
+		aggstate->phases[0].gset_lengths = palloc(numIndexes * sizeof(int));
+		aggstate->phases[0].grouped_cols = palloc(numIndexes * sizeof(Bitmapset *));
+	}
 
 	phase = 0;
 	for (phaseidx = 0; phaseidx <= list_length(node->chain); ++phaseidx)
@@ -3513,6 +4323,18 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			aggnode = list_nth_node(Agg, node->chain, phaseidx - 1);
 			sortnode = castNode(Sort, outerPlan(aggnode));
 		}
+		else if (use_index)
+		{
+			Assert(list_length(node->chain) == 1);
+
+			aggnode = node;
+			sortnode = castNode(Sort, linitial(node->chain));
+			/* 
+			 * list contains single element, so we must adjust loop variable,
+			 * so it will be single iteration at all.
+			 */
+			phaseidx++;
+		}
 		else
 		{
 			aggnode = node;
@@ -3549,6 +4371,35 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			all_grouped_cols = bms_add_members(all_grouped_cols, cols);
 			continue;
 		}
+		else if (aggnode->aggstrategy == AGG_INDEX)
+		{
+			AggStatePerPhase phasedata = &aggstate->phases[0];
+			AggStatePerIndex perindex;
+			Bitmapset *cols;
+
+			Assert(phase == 0);
+			Assert(sortnode);
+
+			i = phasedata->numsets++;
+
+			/* phase 0 always points to the "real" Agg in the index case */
+			phasedata->aggnode = node;
+			phasedata->aggstrategy = node->aggstrategy;
+			phasedata->sortnode = sortnode;
+
+			perindex = &aggstate->perindex[i];
+			perindex->aggnode = aggnode;
+			aggstate->index_sort = sortnode;
+
+			phasedata->gset_lengths[i] = perindex->numKeyCols = aggnode->numCols;
+
+			cols = NULL;
+			for (j = 0; j < aggnode->numCols; ++j)
+				cols = bms_add_member(cols, aggnode->grpColIdx[j]);
+
+			phasedata->grouped_cols[i] = cols;
+			all_grouped_cols = bms_add_members(all_grouped_cols, cols);
+		}
 		else
 		{
 			AggStatePerPhase phasedata = &aggstate->phases[++phase];
@@ -3666,7 +4517,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	aggstate->all_pergroups = palloc0_array(AggStatePerGroup, numGroupingSets + numHashes);
 	pergroups = aggstate->all_pergroups;
 
-	if (node->aggstrategy != AGG_HASHED)
+	if (node->aggstrategy != AGG_HASHED && node->aggstrategy != AGG_INDEX)
 	{
 		for (i = 0; i < numGroupingSets; i++)
 		{
@@ -3680,18 +4531,15 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	/*
 	 * Hashing can only appear in the initial phase.
 	 */
-	if (use_hashing)
+	if (use_hashing || use_index)
 	{
 		Plan	   *outerplan = outerPlan(node);
 		double		totalGroups = 0;
 
-		aggstate->hash_spill_rslot = ExecInitExtraTupleSlot(estate, scanDesc,
-															&TTSOpsMinimalTuple);
-		aggstate->hash_spill_wslot = ExecInitExtraTupleSlot(estate, scanDesc,
-															&TTSOpsVirtual);
-
-		/* this is an array of pointers, not structures */
-		aggstate->hash_pergroup = pergroups;
+		aggstate->spill_rslot = ExecInitExtraTupleSlot(estate, scanDesc,
+													   &TTSOpsMinimalTuple);
+		aggstate->spill_wslot = ExecInitExtraTupleSlot(estate, scanDesc,
+													   &TTSOpsVirtual);
 
 		aggstate->hashentrysize = hash_agg_entry_size(aggstate->numtrans,
 													  outerplan->plan_width,
@@ -3706,20 +4554,115 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		for (int k = 0; k < aggstate->num_hashes; k++)
 			totalGroups += aggstate->perhash[k].aggnode->numGroups;
 
-		hash_agg_set_limits(aggstate->hashentrysize, totalGroups, 0,
-							&aggstate->hash_mem_limit,
-							&aggstate->hash_ngroups_limit,
-							&aggstate->hash_planned_partitions);
-		find_hash_columns(aggstate);
+		agg_set_limits(aggstate->hashentrysize, totalGroups, 0,
+					   &aggstate->spill_mem_limit,
+					   &aggstate->spill_ngroups_limit,
+					   &aggstate->spill_planned_partitions);
+
+		if (use_hashing)
+		{
+			/* this is an array of pointers, not structures */
+			aggstate->hash_pergroup = pergroups;
+
+			find_hash_columns(aggstate);
+
+			/* Skip massive memory allocation if we are just doing EXPLAIN */
+			if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+				build_hash_tables(aggstate);
+			aggstate->table_filled = false;
+		}
+		else
+		{
+			find_index_columns(aggstate);
 
-		/* Skip massive memory allocation if we are just doing EXPLAIN */
-		if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
-			build_hash_tables(aggstate);
+			if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+				build_index(aggstate);
+			aggstate->index_filled = false;
+		}
 
-		aggstate->table_filled = false;
 
 		/* Initialize this to 1, meaning nothing spilled, yet */
-		aggstate->hash_batches_used = 1;
+		aggstate->spill_batches_used = 1;
+	}
+
+	/* 
+	 * For index merge disk spill may be required and we perform external
+	 * merge for this purpose. But stored tuples are already projected, so
+	 * have different TupleDesc than used in-memory (inputDesc and indexDesc).
+	 */
+	if (use_index)
+	{
+		AggStatePerIndex perindex = aggstate->perindex;
+		ListCell *lc;
+		List *targetlist = aggstate->ss.ps.plan->targetlist;
+		AttrNumber *attr_mapping_tl = 
+						palloc0(sizeof(AttrNumber) * list_length(targetlist));
+		AttrNumber *keyColIdxResult;
+
+		/* 
+		 * Build grouping column attribute mapping and store it in
+		 * attr_mapping_tl.  If there is no such mapping (projected), then
+		 * InvalidAttrNumber is set, otherwise index in indexDesc column
+		 * storing this attribute.
+		 */
+		foreach (lc, targetlist)
+		{
+			TargetEntry *te = (TargetEntry *)lfirst(lc);
+			Var *group_var;
+
+			/* All grouping expressions in targetlist stored as OUTER Vars */
+			if (!IsA(te->expr, Var))
+				continue;
+
+			group_var = (Var *)te->expr;
+			if (group_var->varno != OUTER_VAR)
+				continue;
+
+			attr_mapping_tl[foreach_current_index(lc)] = group_var->varattno;
+		}
+
+		/* Mapping is built and now create reverse mapping */
+		keyColIdxResult = palloc0(sizeof(AttrNumber) * list_length(outerPlan(node)->targetlist));
+		for (i = 0; i < list_length(targetlist); ++i)
+		{
+			AttrNumber outer_attno = attr_mapping_tl[i];
+			AttrNumber existingIdx;
+
+			if (!AttributeNumberIsValid(outer_attno))
+				continue;
+
+			existingIdx = keyColIdxResult[outer_attno - 1];
+
+			/* attnumbers can duplicate, so use first ones */
+			if (AttributeNumberIsValid(existingIdx) && existingIdx <= outer_attno)
+				continue;
+
+			/* 
+			 * column can be referenced in query but planner can decide to
+			 * remove is from grouping.
+			 */
+			if (!bms_is_member(outer_attno, all_grouped_cols))
+				continue;
+
+			keyColIdxResult[outer_attno - 1] = i + 1;
+		}
+
+		perindex->idxKeyColIdxTL = palloc(sizeof(AttrNumber) * perindex->numKeyCols);
+		for (i = 0; i < perindex->numKeyCols; ++i)
+		{
+			AttrNumber attno = keyColIdxResult[perindex->idxKeyColIdxInput[i] - 1];
+			if (!AttributeNumberIsValid(attno))
+				elog(ERROR, "could not locate group by attributes in targetlist for index mapping");
+
+			perindex->idxKeyColIdxTL[i] = attno;
+		}
+
+		pfree(attr_mapping_tl);
+		pfree(keyColIdxResult);
+
+		perindex->mergeslot = ExecInitExtraTupleSlot(estate,
+													 aggstate->ss.ps.ps_ResultTupleDesc, 
+													 &TTSOpsMinimalTuple);
 	}
 
 	/*
@@ -3732,13 +4675,19 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	{
 		aggstate->current_phase = 0;
 		initialize_phase(aggstate, 0);
-		select_current_set(aggstate, 0, true);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_HASH);
+	}
+	else if (node->aggstrategy == AGG_INDEX)
+	{
+		aggstate->current_phase = 0;
+		initialize_phase(aggstate, 0);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_INDEX);
 	}
 	else
 	{
 		aggstate->current_phase = 1;
 		initialize_phase(aggstate, 1);
-		select_current_set(aggstate, 0, false);
+		select_current_set(aggstate, 0, GROUPING_STRATEGY_SORT);
 	}
 
 	/*
@@ -4066,8 +5015,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 	for (phaseidx = 0; phaseidx < aggstate->numphases; phaseidx++)
 	{
 		AggStatePerPhase phase = &aggstate->phases[phaseidx];
-		bool		dohash = false;
-		bool		dosort = false;
+		int			strategy;
 
 		/* phase 0 doesn't necessarily exist */
 		if (!phase->aggnode)
@@ -4079,8 +5027,7 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 			 * Phase one, and only phase one, in a mixed agg performs both
 			 * sorting and aggregation.
 			 */
-			dohash = true;
-			dosort = true;
+			strategy = GROUPING_STRATEGY_HASH | GROUPING_STRATEGY_SORT;
 		}
 		else if (aggstate->aggstrategy == AGG_MIXED && phaseidx == 0)
 		{
@@ -4094,19 +5041,24 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
 		else if (phase->aggstrategy == AGG_PLAIN ||
 				 phase->aggstrategy == AGG_SORTED)
 		{
-			dohash = false;
-			dosort = true;
+			strategy = GROUPING_STRATEGY_SORT;
 		}
 		else if (phase->aggstrategy == AGG_HASHED)
 		{
-			dohash = true;
-			dosort = false;
+			strategy = GROUPING_STRATEGY_HASH;
+		}
+		else if (phase->aggstrategy == AGG_INDEX)
+		{
+			strategy = GROUPING_STRATEGY_INDEX;
 		}
 		else
+		{
 			Assert(false);
+			/* keep compiler quiet */
+			strategy = 0;
+		}
 
-		phase->evaltrans = ExecBuildAggTrans(aggstate, phase, dosort, dohash,
-											 false);
+		phase->evaltrans = ExecBuildAggTrans(aggstate, phase, strategy, false);
 
 		/* cache compiled expression for outer slot without NULL check */
 		phase->evaltrans_cache[0][0] = phase->evaltrans;
@@ -4409,9 +5361,9 @@ ExecEndAgg(AggState *node)
 
 		Assert(ParallelWorkerNumber <= node->shared_info->num_workers);
 		si = &node->shared_info->sinstrument[ParallelWorkerNumber];
-		si->hash_batches_used = node->hash_batches_used;
-		si->hash_disk_used = node->hash_disk_used;
-		si->hash_mem_peak = node->hash_mem_peak;
+		si->hash_batches_used = node->spill_batches_used;
+		si->hash_disk_used = node->spill_disk_used;
+		si->hash_mem_peak = node->spill_mem_peak;
 	}
 
 	/* Make sure we have closed any open tuplesorts */
@@ -4421,7 +5373,10 @@ ExecEndAgg(AggState *node)
 	if (node->sort_out)
 		tuplesort_end(node->sort_out);
 
-	hashagg_reset_spill_state(node);
+	if (node->aggstrategy == AGG_INDEX)
+		indexagg_reset_spill_state(node);
+	else
+		hashagg_reset_spill_state(node);
 
 	/* Release hash tables too */
 	if (node->hash_metacxt != NULL)
@@ -4434,6 +5389,26 @@ ExecEndAgg(AggState *node)
 		MemoryContextDelete(node->hash_tuplescxt);
 		node->hash_tuplescxt = NULL;
 	}
+	if (node->index_metacxt != NULL)
+	{
+		MemoryContextDelete(node->index_metacxt);
+		node->index_metacxt = NULL;
+	}
+	if (node->index_entrycxt != NULL)
+	{
+		MemoryContextDelete(node->index_entrycxt);
+		node->index_entrycxt = NULL;
+	}
+	if (node->index_nodecxt != NULL)
+	{
+		MemoryContextDelete(node->index_nodecxt);
+		node->index_nodecxt = NULL;
+	}
+	if (node->mergestate)
+	{
+		tuplesort_end(node->mergestate);
+		node->mergestate = NULL;
+	}
 
 	for (transno = 0; transno < node->numtrans; transno++)
 	{
@@ -4451,6 +5426,8 @@ ExecEndAgg(AggState *node)
 		ReScanExprContext(node->aggcontexts[setno]);
 	if (node->hashcontext)
 		ReScanExprContext(node->hashcontext);
+	if (node->indexcontext)
+		ReScanExprContext(node->indexcontext);
 
 	outerPlan = outerPlanState(node);
 	ExecEndNode(outerPlan);
@@ -4486,12 +5463,27 @@ ExecReScanAgg(AggState *node)
 		 * we can just rescan the existing hash table; no need to build it
 		 * again.
 		 */
-		if (outerPlan->chgParam == NULL && !node->hash_ever_spilled &&
+		if (outerPlan->chgParam == NULL && !node->spill_ever_happened &&
 			!bms_overlap(node->ss.ps.chgParam, aggnode->aggParams))
 		{
 			ResetTupleHashIterator(node->perhash[0].hashtable,
 								   &node->perhash[0].hashiter);
-			select_current_set(node, 0, true);
+			select_current_set(node, 0, GROUPING_STRATEGY_HASH);
+			return;
+		}
+	}
+
+	if (node->aggstrategy == AGG_INDEX)
+	{
+		if (!node->index_filled)
+			return;
+
+		if (outerPlan->chgParam == NULL && !node->spill_ever_happened &&
+			!bms_overlap(node->ss.ps.chgParam, aggnode->aggParams))
+		{
+			AggStatePerIndex perindex = node->perindex;
+			ResetTupleIndexIterator(perindex->index, &perindex->iter);
+			select_current_set(node, 0, GROUPING_STRATEGY_INDEX);
 			return;
 		}
 	}
@@ -4545,9 +5537,9 @@ ExecReScanAgg(AggState *node)
 	{
 		hashagg_reset_spill_state(node);
 
-		node->hash_ever_spilled = false;
-		node->hash_spill_mode = false;
-		node->hash_ngroups_current = 0;
+		node->spill_ever_happened = false;
+		node->spill_mode = false;
+		node->spill_ngroups_current = 0;
 
 		ReScanExprContext(node->hashcontext);
 		/* Rebuild empty hash table(s) */
@@ -4555,10 +5547,33 @@ ExecReScanAgg(AggState *node)
 		node->table_filled = false;
 		/* iterator will be reset when the table is filled */
 
-		hashagg_recompile_expressions(node, false, false);
+		agg_recompile_expressions(node, false, false);
 	}
 
-	if (node->aggstrategy != AGG_HASHED)
+	if (node->aggstrategy == AGG_INDEX)
+	{
+		indexagg_reset_spill_state(node);
+
+		node->spill_ever_happened = false;
+		node->spill_mode = false;
+		node->spill_ngroups_current = 0;
+
+		ReScanExprContext(node->indexcontext);
+		MemoryContextReset(node->index_entrycxt);
+		MemoryContextReset(node->index_nodecxt);
+
+		build_index(node);
+		node->index_filled = false;
+
+		agg_recompile_expressions(node, false, false);
+
+		if (node->mergestate)
+		{
+			tuplesort_end(node->mergestate);
+			node->mergestate = NULL;
+		}
+	}
+	else if (node->aggstrategy != AGG_HASHED)
 	{
 		/*
 		 * Reset the per-group state (in particular, mark transvalues null)
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 473d31188b8..f3772ef4f1a 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1900,6 +1900,7 @@ static void
 inittapestate(Tuplesortstate *state, int maxTapes)
 {
 	int64		tapeSpace;
+	Size		memtuplesSize;
 
 	/*
 	 * Decrease availMem to reflect the space needed for tape buffers; but
@@ -1912,7 +1913,16 @@ inittapestate(Tuplesortstate *state, int maxTapes)
 	 */
 	tapeSpace = (int64) maxTapes * TAPE_BUFFER_OVERHEAD;
 
-	if (tapeSpace + GetMemoryChunkSpace(state->memtuples) < state->allowedMem)
+	/* 
+	 * In merge state during initial run creation we do not use in-memory
+	 * tuples array and write to tapes directly.
+	 */
+	if (state->memtuples != NULL)
+		memtuplesSize = GetMemoryChunkSpace(state->memtuples);
+	else
+		memtuplesSize = 0;
+
+	if (tapeSpace + memtuplesSize < state->allowedMem)
 		USEMEM(state, tapeSpace);
 
 	/*
@@ -2031,11 +2041,14 @@ mergeruns(Tuplesortstate *state)
 
 	/*
 	 * We no longer need a large memtuples array.  (We will allocate a smaller
-	 * one for the heap later.)
+	 * one for the heap later.)  Note that in merge state this array can be NULL.
 	 */
-	FREEMEM(state, GetMemoryChunkSpace(state->memtuples));
-	pfree(state->memtuples);
-	state->memtuples = NULL;
+	if (state->memtuples)
+	{
+		FREEMEM(state, GetMemoryChunkSpace(state->memtuples));
+		pfree(state->memtuples);
+		state->memtuples = NULL;
+	}
 
 	/*
 	 * Initialize the slab allocator.  We need one slab slot per input tape,
@@ -3157,3 +3170,189 @@ ssup_datum_int32_cmp(Datum x, Datum y, SortSupport ssup)
 	else
 		return 0;
 }
+
+/* 
+ *    tuplemerge_begin_common
+ * 
+ * Create new Tuplesortstate for performing merge only. This is used when
+ * we know, that input is sorted, but stored in multiple tapes, so only
+ * have to perform merge.
+ * 
+ * Unlike tuplesort_begin_common it does not accept sortopt, because none
+ * of current options are supported by merge (random access and bounded sort).
+ */
+Tuplesortstate *
+tuplemerge_begin_common(int workMem, SortCoordinate coordinate)
+{
+	Tuplesortstate *state;
+	MemoryContext maincontext;
+	MemoryContext sortcontext;
+	MemoryContext oldcontext;
+
+	/*
+	 * Memory context surviving tuplesort_reset.  This memory context holds
+	 * data which is useful to keep while sorting multiple similar batches.
+	 */
+	maincontext = AllocSetContextCreate(CurrentMemoryContext,
+										"TupleMerge main",
+										ALLOCSET_DEFAULT_SIZES);
+
+	/*
+	 * Create a working memory context for one sort operation.  The content of
+	 * this context is deleted by tuplesort_reset.
+	 */
+	sortcontext = AllocSetContextCreate(maincontext,
+										"TupleMerge merge",
+										ALLOCSET_DEFAULT_SIZES);
+
+	/*
+	 * Make the Tuplesortstate within the per-sortstate context.  This way, we
+	 * don't need a separate pfree() operation for it at shutdown.
+	 */
+	oldcontext = MemoryContextSwitchTo(maincontext);
+
+	state = (Tuplesortstate *) palloc0(sizeof(Tuplesortstate));
+
+	if (trace_sort)
+		pg_rusage_init(&state->ru_start);
+
+	state->base.sortopt = TUPLESORT_NONE;
+	state->base.tuples = true;
+	state->abbrevNext = 10;
+
+	/*
+	 * workMem is forced to be at least 64KB, the current minimum valid value
+	 * for the work_mem GUC.  This is a defense against parallel sort callers
+	 * that divide out memory among many workers in a way that leaves each
+	 * with very little memory.
+	 */
+	state->allowedMem = Max(workMem, 64) * (int64) 1024;
+	state->base.sortcontext = sortcontext;
+	state->base.maincontext = maincontext;
+
+	/*
+	 * After all of the other non-parallel-related state, we setup all of the
+	 * state needed for each batch.
+	 */
+
+	/* 
+	 * Merging do not accept RANDOMACCESS, so only possible context is Bump,
+	 * which saves some cycles.
+	 */
+	state->base.tuplecontext = BumpContextCreate(state->base.sortcontext,
+												 "Caller tuples",
+												 ALLOCSET_DEFAULT_SIZES);
+
+	state->status = TSS_BUILDRUNS;
+	state->bounded = false;
+	state->boundUsed = false;
+	state->availMem = state->allowedMem;
+
+	/* 
+	 * When performing merge we do not need in-memory array for sorting.
+	 * Even if we do not use memtuples, still allocate it, but make it empty.
+	 * So if someone will invoke inappropriate function in merge mode we will
+	 * not fail.
+	 */
+	state->memtuples = NULL;
+	state->memtupcount = 0;
+	state->memtupsize = INITIAL_MEMTUPSIZE;
+	state->growmemtuples = true;
+	state->slabAllocatorUsed = false;
+
+	/*
+	 * Tape variables (inputTapes, outputTapes, etc.) will be initialized by
+	 * inittapes(), if needed.
+	 */
+	state->result_tape = NULL;	/* flag that result tape has not been formed */
+	state->tapeset = NULL;
+
+	inittapes(state, true);
+
+	/*
+	 * Initialize parallel-related state based on coordination information
+	 * from caller
+	 */
+	if (!coordinate)
+	{
+		/* Serial sort */
+		state->shared = NULL;
+		state->worker = -1;
+		state->nParticipants = -1;
+	}
+	else if (coordinate->isWorker)
+	{
+		/* Parallel worker produces exactly one final run from all input */
+		state->shared = coordinate->sharedsort;
+		state->worker = worker_get_identifier(state);
+		state->nParticipants = -1;
+	}
+	else
+	{
+		/* Parallel leader state only used for final merge */
+		state->shared = coordinate->sharedsort;
+		state->worker = -1;
+		state->nParticipants = coordinate->nParticipants;
+		Assert(state->nParticipants >= 1);
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+void
+tuplemerge_start_run(Tuplesortstate *state)
+{
+	if (state->memtupcount == 0)
+		return;
+
+	selectnewtape(state);
+	state->memtupcount = 0;
+}
+
+void
+tuplemerge_performmerge(Tuplesortstate *state)
+{
+	if (state->memtupcount == 0)
+	{
+		/* 
+		 * We have started new run, but no tuples were written. mergeruns
+		 * expects that each run have at least 1 tuple, otherwise it
+		 * will fail to even fill initial merge heap.
+		 */
+		state->nOutputRuns--;
+	}
+	else
+		state->memtupcount = 0;
+
+	mergeruns(state);
+
+	state->current = 0;
+	state->eof_reached = false;
+	state->markpos_block = 0L;
+	state->markpos_offset = 0;
+	state->markpos_eof = false;
+}
+
+void
+tuplemerge_puttuple_common(Tuplesortstate *state, SortTuple *tuple, Size tuplen)
+{
+	MemoryContext oldcxt = MemoryContextSwitchTo(state->base.sortcontext);
+
+	Assert(state->destTape);	
+	WRITETUP(state, state->destTape, tuple);
+
+	MemoryContextSwitchTo(oldcxt);
+
+	state->memtupcount++;
+}
+
+void
+tuplemerge_end_run(Tuplesortstate *state)
+{
+	if (state->memtupcount != 0)
+	{
+		markrunend(state->destTape);
+	}
+}
diff --git a/src/backend/utils/sort/tuplesortvariants.c b/src/backend/utils/sort/tuplesortvariants.c
index e3e1142126e..24048192fb2 100644
--- a/src/backend/utils/sort/tuplesortvariants.c
+++ b/src/backend/utils/sort/tuplesortvariants.c
@@ -2070,3 +2070,108 @@ readtup_datum(Tuplesortstate *state, SortTuple *stup,
 	if (base->sortopt & TUPLESORT_RANDOMACCESS) /* need trailing length word? */
 		LogicalTapeReadExact(tape, &tuplen, sizeof(tuplen));
 }
+
+Tuplesortstate *
+tuplemerge_begin_heap(TupleDesc tupDesc,
+					  int nkeys, AttrNumber *attNums,
+					  Oid *sortOperators, Oid *sortCollations,
+					  bool *nullsFirstFlags,
+					  int workMem, SortCoordinate coordinate)
+{
+	Tuplesortstate *state = tuplemerge_begin_common(workMem, coordinate);
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	MemoryContext oldcontext;
+	int			i;
+
+	oldcontext = MemoryContextSwitchTo(base->maincontext);
+
+	Assert(nkeys > 0);
+
+	if (trace_sort)
+		elog(LOG,
+			 "begin tuple merge: nkeys = %d, workMem = %d", nkeys, workMem);
+
+	base->nKeys = nkeys;
+
+	TRACE_POSTGRESQL_SORT_START(HEAP_SORT,
+								false,	/* no unique check */
+								nkeys,
+								workMem,
+								false,
+								PARALLEL_SORT(coordinate));
+
+	base->removeabbrev = removeabbrev_heap;
+	base->comparetup = comparetup_heap;
+	base->comparetup_tiebreak = comparetup_heap_tiebreak;
+	base->writetup = writetup_heap;
+	base->readtup = readtup_heap;
+	base->haveDatum1 = true;
+	base->arg = tupDesc;		/* assume we need not copy tupDesc */
+
+	/* Prepare SortSupport data for each column */
+	base->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));
+
+	for (i = 0; i < nkeys; i++)
+	{
+		SortSupport sortKey = base->sortKeys + i;
+
+		Assert(attNums[i] != 0);
+		Assert(sortOperators[i] != 0);
+
+		sortKey->ssup_cxt = CurrentMemoryContext;
+		sortKey->ssup_collation = sortCollations[i];
+		sortKey->ssup_nulls_first = nullsFirstFlags[i];
+		sortKey->ssup_attno = attNums[i];
+		/* Convey if abbreviation optimization is applicable in principle */
+		sortKey->abbreviate = (i == 0 && base->haveDatum1);
+
+		PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
+	}
+
+	/*
+	 * The "onlyKey" optimization cannot be used with abbreviated keys, since
+	 * tie-breaker comparisons may be required.  Typically, the optimization
+	 * is only of value to pass-by-value types anyway, whereas abbreviated
+	 * keys are typically only of value to pass-by-reference types.
+	 */
+	if (nkeys == 1 && !base->sortKeys->abbrev_converter)
+		base->onlyKey = base->sortKeys;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+void
+tuplemerge_puttupleslot(Tuplesortstate *state, TupleTableSlot *slot)
+{
+	TuplesortPublic *base = TuplesortstateGetPublic(state);
+	MemoryContext oldcontext = MemoryContextSwitchTo(base->tuplecontext);
+	TupleDesc	tupDesc = (TupleDesc) base->arg;
+	SortTuple	stup;
+	MinimalTuple tuple;
+	HeapTupleData htup;
+	Size		tuplen;
+
+	/* copy the tuple into sort storage */
+	tuple = ExecCopySlotMinimalTuple(slot);
+	stup.tuple = tuple;
+	/* set up first-column key value */
+	htup.t_len = tuple->t_len + MINIMAL_TUPLE_OFFSET;
+	htup.t_data = (HeapTupleHeader) ((char *) tuple - MINIMAL_TUPLE_OFFSET);
+	stup.datum1 = heap_getattr(&htup,
+							   base->sortKeys[0].ssup_attno,
+							   tupDesc,
+							   &stup.isnull1);
+
+	/* GetMemoryChunkSpace is not supported for bump contexts */
+	if (TupleSortUseBumpTupleCxt(base->sortopt))
+		tuplen = MAXALIGN(tuple->t_len);
+	else
+		tuplen = GetMemoryChunkSpace(tuple);
+
+	tuplemerge_puttuple_common(state, &stup, tuplen);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c923ca6d8a9..b1481a93753 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -393,8 +393,16 @@ extern ExprState *ExecInitExprWithParams(Expr *node, ParamListInfo ext_params);
 extern ExprState *ExecInitQual(List *qual, PlanState *parent);
 extern ExprState *ExecInitCheck(List *qual, PlanState *parent);
 extern List *ExecInitExprList(List *nodes, PlanState *parent);
+
+/* 
+ * Which strategy to use for aggregation/grouping
+ */
+#define GROUPING_STRATEGY_SORT			1
+#define GROUPING_STRATEGY_HASH			(1 << 1)
+#define GROUPING_STRATEGY_INDEX			(1 << 2)
+
 extern ExprState *ExecBuildAggTrans(AggState *aggstate, struct AggStatePerPhaseData *phase,
-									bool doSort, bool doHash, bool nullcheck);
+									int groupStrategy, bool nullcheck);
 extern ExprState *ExecBuildHash32FromAttrs(TupleDesc desc,
 										   const TupleTableSlotOps *ops,
 										   FmgrInfo *hashfunctions,
diff --git a/src/include/executor/nodeAgg.h b/src/include/executor/nodeAgg.h
index 1e1be9666ae..dc14b714369 100644
--- a/src/include/executor/nodeAgg.h
+++ b/src/include/executor/nodeAgg.h
@@ -321,6 +321,33 @@ typedef struct AggStatePerHashData
 	Agg		   *aggnode;		/* original Agg node, for numGroups etc. */
 } AggStatePerHashData;
 
+/* 
+ * AggStatePerIndexData - per-index state
+ *
+ * Logic is the same as for AggStatePerHashData - one of these for each
+ * grouping set.
+ */
+typedef struct AggStatePerIndexData
+{
+	TupleIndex	index;			/* current in-memory index data */
+	MemoryContext metacxt;		/* memory context containing TupleIndex */
+	MemoryContext tempctx;		/* short-lived context */
+	TupleTableSlot *indexslot; 	/* slot for loading index */
+	int			numCols;		/* total number of columns in index tuple */
+	int			numKeyCols;		/* number of key columns in index tuple */
+	int			largestGrpColIdx;	/* largest col required for comparison */
+	AttrNumber *idxKeyColIdxInput;	/* key column indices in input slot */
+	AttrNumber *idxKeyColIdxIndex;	/* key column indices in index tuples */
+	TupleIndexIteratorData iter;	/* iterator state for index */
+	Agg		   *aggnode;		/* original Agg node, for numGroups etc. */	
+
+	/* state used only for spill mode */
+	AttrNumber	*idxKeyColIdxTL;	/* key column indices in target list */
+	FmgrInfo    *hashfunctions;	/* tuple hashing function */
+	ExprState   *indexhashexpr;	/* ExprState for hashing index datatype(s) */
+	ExprContext *exprcontext;	/* expression context */
+	TupleTableSlot *mergeslot;	/* slot for loading tuple during merge */
+}			AggStatePerIndexData;
 
 extern AggState *ExecInitAgg(Agg *node, EState *estate, int eflags);
 extern void ExecEndAgg(AggState *node);
@@ -328,9 +355,9 @@ extern void ExecReScanAgg(AggState *node);
 
 extern Size hash_agg_entry_size(int numTrans, Size tupleWidth,
 								Size transitionSpace);
-extern void hash_agg_set_limits(double hashentrysize, double input_groups,
-								int used_bits, Size *mem_limit,
-								uint64 *ngroups_limit, int *num_partitions);
+extern void agg_set_limits(double hashentrysize, double input_groups,
+						   int used_bits, Size *mem_limit,
+						   uint64 *ngroups_limit, int *num_partitions);
 
 /* parallel instrumentation support */
 extern void ExecAggEstimate(AggState *node, ParallelContext *pcxt);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index b6ad28618ab..ba7f84199ad 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -365,6 +365,7 @@ typedef enum AggStrategy
 	AGG_SORTED,					/* grouped agg, input must be sorted */
 	AGG_HASHED,					/* grouped agg, use internal hashtable */
 	AGG_MIXED,					/* grouped agg, hash and sort both used */
+	AGG_INDEX,					/* grouped agg, build index for input */
 } AggStrategy;
 
 /*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 4bc6fb5670e..6c9e5db013b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1219,7 +1219,7 @@ typedef struct Agg
 	/* grouping sets to use */
 	List	   *groupingSets;
 
-	/* chained Agg/Sort nodes */
+	/* chained Agg/Sort nodes, for AGG_INDEX contains single Sort node */
 	List	   *chain;
 } Agg;
 
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 0a156bce44d..9eb31b73ce1 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -475,6 +475,21 @@ extern GinTuple *tuplesort_getgintuple(Tuplesortstate *state, Size *len,
 									   bool forward);
 extern bool tuplesort_getdatum(Tuplesortstate *state, bool forward, bool copy,
 							   Datum *val, bool *isNull, Datum *abbrev);
-
+/* 
+* Special state for merge mode.
+*/
+extern Tuplesortstate *tuplemerge_begin_common(int workMem,
+											   SortCoordinate coordinate);
+extern Tuplesortstate *tuplemerge_begin_heap(TupleDesc tupDesc,
+											int nkeys, AttrNumber *attNums,
+											Oid *sortOperators, Oid *sortCollations,
+											bool *nullsFirstFlags,
+											int workMem, SortCoordinate coordinate);
+extern void tuplemerge_start_run(Tuplesortstate *state);
+extern void tuplemerge_end_run(Tuplesortstate *state);
+extern void tuplemerge_puttuple_common(Tuplesortstate *state, SortTuple *tuple,
+									   Size tuplen);
+extern void tuplemerge_puttupleslot(Tuplesortstate *state, TupleTableSlot *slot);
+extern void tuplemerge_performmerge(Tuplesortstate *state);
 
 #endif							/* TUPLESORT_H */
-- 
2.43.0

v4-0003-make-use-of-IndexAggregate-in-planner-and-explain.patchtext/x-patch; charset=UTF-8; name=v4-0003-make-use-of-IndexAggregate-in-planner-and-explain.patchDownload

From b1b11336b95f489bc6feded2951f53bdebc6c904 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Wed, 3 Dec 2025 17:34:18 +0300
Subject: [PATCH v4 3/5] make use of IndexAggregate in planner and explain

This commit adds usage of IndexAggregate in planner and explain (analyze).

We calculate cost of IndexAggregate and add AGG_INDEX node to the pathlist.
Cost of this node is cost of building B+tree (in memory), disk spill and
final external merge.

For EXPLAIN there is only little change - show sort information in "Group Key".
---
 src/backend/commands/explain.c                | 101 +++++++++++--
 src/backend/optimizer/path/costsize.c         | 137 +++++++++++++-----
 src/backend/optimizer/plan/createplan.c       |  15 +-
 src/backend/optimizer/plan/planner.c          |  35 +++++
 src/backend/optimizer/util/pathnode.c         |   9 ++
 src/backend/utils/misc/guc_parameters.dat     |   7 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h                 |   3 +-
 src/include/optimizer/cost.h                  |   1 +
 9 files changed, 251 insertions(+), 58 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1e68ad1565f..e108c377c81 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -134,7 +134,7 @@ static void show_recursive_union_info(RecursiveUnionState *rstate,
 									  ExplainState *es);
 static void show_memoize_info(MemoizeState *mstate, List *ancestors,
 							  ExplainState *es);
-static void show_hashagg_info(AggState *aggstate, ExplainState *es);
+static void show_agg_spill_info(AggState *aggstate, ExplainState *es);
 static void show_indexsearches_info(PlanState *planstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 								ExplainState *es);
@@ -1556,6 +1556,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
 						pname = "MixedAggregate";
 						strategy = "Mixed";
 						break;
+					case AGG_INDEX:
+						pname = "IndexAggregate";
+						strategy = "Indexed";
+						break;
 					default:
 						pname = "Aggregate ???";
 						strategy = "???";
@@ -2200,7 +2204,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Agg:
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
-			show_hashagg_info((AggState *) planstate, es);
+			show_agg_spill_info((AggState *) planstate, es);
 			if (plan->qual)
 				show_instrumentation_count("Rows Removed by Filter", 1,
 										   planstate, es);
@@ -2631,6 +2635,24 @@ show_agg_keys(AggState *astate, List *ancestors,
 
 		if (plan->groupingSets)
 			show_grouping_sets(outerPlanState(astate), plan, ancestors, es);
+		else if (plan->aggstrategy == AGG_INDEX)
+			{
+				Sort	*sort = astate->index_sort;
+
+				/* 
+				 * Index Agg reorders GROUP BY keys to match ORDER BY
+				 * so they must be the same, but we should show other
+				 * useful information about used ordering, such as direction.
+				 */
+				Assert(sort != NULL);
+				show_sort_group_keys(outerPlanState(astate), "Group Key",
+									 plan->numCols, 0,
+									 sort->sortColIdx,
+									 sort->sortOperators,
+									 sort->collations,
+									 sort->nullsFirst,
+									 ancestors, es);
+			}
 		else
 			show_sort_group_keys(outerPlanState(astate), "Group Key",
 								 plan->numCols, 0, plan->grpColIdx,
@@ -3735,47 +3757,67 @@ show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es)
 }
 
 /*
- * Show information on hash aggregate memory usage and batches.
+ * Show information on hash or index aggregate memory usage and batches.
  */
 static void
-show_hashagg_info(AggState *aggstate, ExplainState *es)
+show_agg_spill_info(AggState *aggstate, ExplainState *es)
 {
 	Agg		   *agg = (Agg *) aggstate->ss.ps.plan;
-	int64		memPeakKb = BYTES_TO_KILOBYTES(aggstate->hash_mem_peak);
+	int64		memPeakKb = BYTES_TO_KILOBYTES(aggstate->spill_mem_peak);
 
 	if (agg->aggstrategy != AGG_HASHED &&
-		agg->aggstrategy != AGG_MIXED)
+		agg->aggstrategy != AGG_MIXED &&
+		agg->aggstrategy != AGG_INDEX)
 		return;
 
 	if (es->format != EXPLAIN_FORMAT_TEXT)
 	{
 		if (es->costs)
 			ExplainPropertyInteger("Planned Partitions", NULL,
-								   aggstate->hash_planned_partitions, es);
+								   aggstate->spill_planned_partitions, es);
 
 		/*
 		 * During parallel query the leader may have not helped out.  We
 		 * detect this by checking how much memory it used.  If we find it
 		 * didn't do any work then we don't show its properties.
 		 */
-		if (es->analyze && aggstate->hash_mem_peak > 0)
+		if (es->analyze && aggstate->spill_mem_peak > 0)
 		{
 			ExplainPropertyInteger("HashAgg Batches", NULL,
-								   aggstate->hash_batches_used, es);
+								   aggstate->spill_batches_used, es);
 			ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es);
 			ExplainPropertyInteger("Disk Usage", "kB",
-								   aggstate->hash_disk_used, es);
+								   aggstate->spill_disk_used, es);
+		}
+
+		if (   es->analyze
+			&& aggstate->aggstrategy == AGG_INDEX
+			&& aggstate->mergestate != NULL)
+		{
+			TuplesortInstrumentation stats;
+			const char *mergeMethod;
+			const char *spaceType;
+			int64 spaceUsed;
+
+			tuplesort_get_stats(aggstate->mergestate, &stats);
+			mergeMethod = tuplesort_method_name(stats.sortMethod);
+			spaceType = tuplesort_space_type_name(stats.spaceType);
+			spaceUsed = stats.spaceUsed;
+
+			ExplainPropertyText("Merge Method", mergeMethod, es);
+			ExplainPropertyInteger("Merge Space Used", "kB", spaceUsed, es);
+			ExplainPropertyText("Merge Space Type", spaceType, es);
 		}
 	}
 	else
 	{
 		bool		gotone = false;
 
-		if (es->costs && aggstate->hash_planned_partitions > 0)
+		if (es->costs && aggstate->spill_planned_partitions > 0)
 		{
 			ExplainIndentText(es);
 			appendStringInfo(es->str, "Planned Partitions: %d",
-							 aggstate->hash_planned_partitions);
+							 aggstate->spill_planned_partitions);
 			gotone = true;
 		}
 
@@ -3784,7 +3826,7 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 		 * detect this by checking how much memory it used.  If we find it
 		 * didn't do any work then we don't show its properties.
 		 */
-		if (es->analyze && aggstate->hash_mem_peak > 0)
+		if (es->analyze && aggstate->spill_mem_peak > 0)
 		{
 			if (!gotone)
 				ExplainIndentText(es);
@@ -3792,17 +3834,44 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 				appendStringInfoSpaces(es->str, 2);
 
 			appendStringInfo(es->str, "Batches: %d  Memory Usage: " INT64_FORMAT "kB",
-							 aggstate->hash_batches_used, memPeakKb);
+							 aggstate->spill_batches_used, memPeakKb);
 			gotone = true;
 
 			/* Only display disk usage if we spilled to disk */
-			if (aggstate->hash_batches_used > 1)
+			if (aggstate->spill_batches_used > 1)
 			{
 				appendStringInfo(es->str, "  Disk Usage: " UINT64_FORMAT "kB",
-								 aggstate->hash_disk_used);
+								 aggstate->spill_disk_used);
 			}
 		}
 
+		/* For index aggregate show stats for final merging */
+		if (   es->analyze
+			&& aggstate->aggstrategy == AGG_INDEX
+			&& aggstate->mergestate != NULL)
+		{
+			TuplesortInstrumentation stats;
+			const char *mergeMethod;
+			const char *spaceType;
+			int64 spaceUsed;
+
+			tuplesort_get_stats(aggstate->mergestate, &stats);
+			mergeMethod = tuplesort_method_name(stats.sortMethod);
+			spaceType = tuplesort_space_type_name(stats.spaceType);
+			spaceUsed = stats.spaceUsed;
+
+			/* 
+			 * If we are here that means that previous check (for mem peak) was
+			 * successfull (can not directly go to merge without any in-memory
+			 * operations).  Do not check other state and just start a new line.
+			 */
+			appendStringInfoChar(es->str, '\n');
+			ExplainIndentText(es);
+			appendStringInfo(es->str, "Merge Method: %s  %s: " INT64_FORMAT "kB",
+							 mergeMethod, spaceType, spaceUsed);
+			gotone = true;
+		}
+
 		if (gotone)
 			appendStringInfoChar(es->str, '\n');
 	}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 16bf1f61a0f..758e42f3caa 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -150,6 +150,7 @@ bool		enable_tidscan = true;
 bool		enable_sort = true;
 bool		enable_incremental_sort = true;
 bool		enable_hashagg = true;
+bool		enable_indexagg = true;
 bool		enable_nestloop = true;
 bool		enable_material = true;
 bool		enable_memoize = true;
@@ -1848,6 +1849,32 @@ cost_recursive_union(Path *runion, Path *nrterm, Path *rterm)
 									rterm->pathtarget->width);
 }
 
+/* 
+ * cost_tuplemerge
+ *		Determines and returns the cost of external merge used in tuplesort.
+ */
+static void
+cost_tuplemerge(double availMem, double input_bytes, double ntuples,
+				Cost comparison_cost, Cost *cost)
+{
+	double		npages = ceil(input_bytes / BLCKSZ);
+	double		nruns = input_bytes / availMem;
+	double		mergeorder = tuplesort_merge_order(availMem);
+	double		log_runs;
+	double		npageaccesses;
+
+	/* Compute logM(r) as log(r) / log(M) */
+	if (nruns > mergeorder)
+		log_runs = ceil(log(nruns) / log(mergeorder));
+	else
+		log_runs = 1.0;
+
+	npageaccesses = 2.0 * npages * log_runs;
+
+	/* Assume 3/4ths of accesses are sequential, 1/4th are not */
+	*cost += npageaccesses * (seq_page_cost * 0.75 + random_page_cost * 0.25);
+}
+
 /*
  * cost_tuplesort
  *	  Determines and returns the cost of sorting a relation using tuplesort,
@@ -1922,11 +1949,6 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		/*
 		 * We'll have to use a disk-based sort of all the tuples
 		 */
-		double		npages = ceil(input_bytes / BLCKSZ);
-		double		nruns = input_bytes / sort_mem_bytes;
-		double		mergeorder = tuplesort_merge_order(sort_mem_bytes);
-		double		log_runs;
-		double		npageaccesses;
 
 		/*
 		 * CPU costs
@@ -1936,16 +1958,8 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
 		*startup_cost = comparison_cost * tuples * LOG2(tuples);
 
 		/* Disk costs */
-
-		/* Compute logM(r) as log(r) / log(M) */
-		if (nruns > mergeorder)
-			log_runs = ceil(log(nruns) / log(mergeorder));
-		else
-			log_runs = 1.0;
-		npageaccesses = 2.0 * npages * log_runs;
-		/* Assume 3/4ths of accesses are sequential, 1/4th are not */
-		*startup_cost += npageaccesses *
-			(seq_page_cost * 0.75 + random_page_cost * 0.25);
+		cost_tuplemerge(sort_mem_bytes, input_bytes, tuples, comparison_cost,
+						startup_cost);
 	}
 	else if (tuples > 2 * output_tuples || input_bytes > sort_mem_bytes)
 	{
@@ -2770,7 +2784,7 @@ cost_agg(Path *path, PlannerInfo *root,
 		total_cost += cpu_tuple_cost * numGroups;
 		output_tuples = numGroups;
 	}
-	else
+	else if (aggstrategy == AGG_HASHED)
 	{
 		/* must be AGG_HASHED */
 		startup_cost = input_total_cost;
@@ -2788,6 +2802,50 @@ cost_agg(Path *path, PlannerInfo *root,
 		total_cost += cpu_tuple_cost * numGroups;
 		output_tuples = numGroups;
 	}
+	else
+	{
+		/* must be AGG_INDEX */
+		startup_cost = input_total_cost;
+		if (!enable_indexagg)
+			++disabled_nodes;
+
+		/* these matches AGG_HASHED */
+		startup_cost += aggcosts->transCost.startup;
+		startup_cost += aggcosts->transCost.per_tuple * input_tuples;
+		startup_cost += (cpu_operator_cost * numGroupCols) * input_tuples;
+		startup_cost += aggcosts->finalCost.startup;
+
+		/* cost of btree top-down traversal */
+		startup_cost +=   LOG2(numGroups)	/* amount of comparisons */
+						* (2.0 * cpu_operator_cost)	/* comparison cost */
+						* input_tuples;
+
+		total_cost = startup_cost;
+		total_cost += aggcosts->finalCost.per_tuple * numGroups;
+		total_cost += cpu_tuple_cost * numGroups;
+		output_tuples = numGroups;
+	}
+
+	/*
+	 * If there are quals (HAVING quals), account for their cost and
+	 * selectivity.  Process it before disk spill logic, because output
+	 * cardinality is required for AGG_INDEX.
+	 */
+	if (quals)
+	{
+		QualCost	qual_cost;
+
+		cost_qual_eval(&qual_cost, quals, root);
+		startup_cost += qual_cost.startup;
+		total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple;
+
+		output_tuples = clamp_row_est(output_tuples *
+									  clauselist_selectivity(root,
+															 quals,
+															 0,
+															 JOIN_INNER,
+															 NULL));
+	}
 
 	/*
 	 * Add the disk costs of hash aggregation that spills to disk.
@@ -2802,7 +2860,7 @@ cost_agg(Path *path, PlannerInfo *root,
 	 * Accrue writes (spilled tuples) to startup_cost and to total_cost;
 	 * accrue reads only to total_cost.
 	 */
-	if (aggstrategy == AGG_HASHED || aggstrategy == AGG_MIXED)
+	if (aggstrategy == AGG_HASHED || aggstrategy == AGG_MIXED || aggstrategy == AGG_INDEX)
 	{
 		double		pages;
 		double		pages_written = 0.0;
@@ -2814,6 +2872,7 @@ cost_agg(Path *path, PlannerInfo *root,
 		uint64		ngroups_limit;
 		int			num_partitions;
 		int			depth;
+		bool		canspill;
 
 		/*
 		 * Estimate number of batches based on the computed limits. If less
@@ -2823,8 +2882,9 @@ cost_agg(Path *path, PlannerInfo *root,
 		hashentrysize = hash_agg_entry_size(list_length(root->aggtransinfos),
 											input_width,
 											aggcosts->transitionSpace);
-		hash_agg_set_limits(hashentrysize, numGroups, 0, &mem_limit,
-							&ngroups_limit, &num_partitions);
+		agg_set_limits(hashentrysize, numGroups, 0, &mem_limit,
+					   &ngroups_limit, &num_partitions);
+		canspill = num_partitions != 0;
 
 		nbatches = Max((numGroups * hashentrysize) / mem_limit,
 					   numGroups / ngroups_limit);
@@ -2861,26 +2921,27 @@ cost_agg(Path *path, PlannerInfo *root,
 		spill_cost = depth * input_tuples * 2.0 * cpu_tuple_cost;
 		startup_cost += spill_cost;
 		total_cost += spill_cost;
-	}
-
-	/*
-	 * If there are quals (HAVING quals), account for their cost and
-	 * selectivity.
-	 */
-	if (quals)
-	{
-		QualCost	qual_cost;
 
-		cost_qual_eval(&qual_cost, quals, root);
-		startup_cost += qual_cost.startup;
-		total_cost += qual_cost.startup + output_tuples * qual_cost.per_tuple;
-
-		output_tuples = clamp_row_est(output_tuples *
-									  clauselist_selectivity(root,
-															 quals,
-															 0,
-															 JOIN_INNER,
-															 NULL));
+		/* 
+		 * IndexAgg requires final external merge stage, but only if spill
+		 * can occur, otherwise everything processed in memory.
+		 */
+		if (aggstrategy == AGG_INDEX && canspill)
+		{
+			double	output_bytes;
+			Cost	comparison_cost;
+			Cost	merge_cost = 0;
+
+			/* size of all projected tuples */
+			output_bytes = path->pathtarget->width * output_tuples;
+			/* default comparison cost */
+			comparison_cost = 2.0 * cpu_operator_cost;
+
+			cost_tuplemerge(work_mem, output_bytes, output_tuples,
+							comparison_cost, &merge_cost);
+			startup_cost += merge_cost;
+			total_cost += merge_cost;
+		}
 	}
 
 	path->rows = output_tuples;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index af41ca69929..7cc23885ef5 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2158,6 +2158,8 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
 	Plan	   *subplan;
 	List	   *tlist;
 	List	   *quals;
+	List	   *chain;
+	AttrNumber *grpColIdx;
 
 	/*
 	 * Agg can project, so no need to be terribly picky about child tlist, but
@@ -2169,17 +2171,24 @@ create_agg_plan(PlannerInfo *root, AggPath *best_path)
 
 	quals = order_qual_clauses(root, best_path->qual);
 
+	grpColIdx = extract_grouping_cols(best_path->groupClause, subplan->targetlist);
+
+	/* For index aggregation we should consider the desired sorting order. */
+	if (best_path->aggstrategy == AGG_INDEX)
+		chain = list_make1(make_sort_from_groupcols(best_path->groupClause, grpColIdx, subplan));
+	else
+		chain = NIL;
+
 	plan = make_agg(tlist, quals,
 					best_path->aggstrategy,
 					best_path->aggsplit,
 					list_length(best_path->groupClause),
-					extract_grouping_cols(best_path->groupClause,
-										  subplan->targetlist),
+					grpColIdx,
 					extract_grouping_ops(best_path->groupClause),
 					extract_grouping_collations(best_path->groupClause,
 												subplan->targetlist),
 					NIL,
-					NIL,
+					chain,
 					best_path->numGroups,
 					best_path->transitionSpace,
 					subplan);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 615c823c67d..8cdbf919902 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3878,6 +3878,21 @@ create_grouping_paths(PlannerInfo *root,
 			 (gd ? gd->any_hashable : grouping_is_hashable(root->processed_groupClause))))
 			flags |= GROUPING_CAN_USE_HASH;
 
+		/* 
+		 * Determine whether we should consider index-based implementation of
+		 * grouping.
+		 * 
+		 * This is more restrictive since it not only must be sortable (for
+		 * purposes of Btree), but also must be hashable, so we can effectively
+		 * spill tuples and later process each batch.
+		 */
+		if (   gd == NULL
+			&& root->numOrderedAggs == 0
+			&& parse->groupClause != NIL
+			&& grouping_is_sortable(root->processed_groupClause)
+			&& grouping_is_hashable(root->processed_groupClause))
+			flags |= GROUPING_CAN_USE_INDEX;
+
 		/*
 		 * Determine whether partial aggregation is possible.
 		 */
@@ -7109,6 +7124,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+	bool		can_index = (extra->flags & GROUPING_CAN_USE_INDEX) != 0;
 	List	   *havingQual = (List *) extra->havingQual;
 	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 	double		dNumGroups = 0;
@@ -7330,6 +7346,25 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		}
 	}
 
+	if (can_index)
+	{
+		/* 
+		 * Generate IndexAgg path.
+		 */
+		Assert(!parse->groupingSets);
+		add_path(grouped_rel, (Path *)
+				 create_agg_path(root,
+								 grouped_rel,
+								 cheapest_path,
+								 grouped_rel->reltarget,
+								 AGG_INDEX,
+								 AGGSPLIT_SIMPLE,
+								 root->processed_groupClause,
+								 havingQual,
+								 agg_costs,
+								 dNumGroups));
+	}
+
 	/*
 	 * When partitionwise aggregate is used, we might have fully aggregated
 	 * paths in the partial pathlist, because add_paths_to_append_rel() will
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 2e9becf3116..93363eaba34 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3030,6 +3030,15 @@ create_agg_path(PlannerInfo *root,
 		else
 			pathnode->path.pathkeys = subpath->pathkeys;	/* preserves order */
 	}
+	else if (aggstrategy == AGG_INDEX)
+	{
+		/* 
+		 * When using index aggregation all grouping columns will be used as
+		 * comparator keys, so output is always sorted.
+		 */
+		pathnode->path.pathkeys = make_pathkeys_for_sortclauses(root, groupClause,
+																root->processed_tlist);
+	}
 	else
 		pathnode->path.pathkeys = NIL;	/* output is unordered */
 
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index cf87c09ca3b..183dac48405 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -877,6 +877,13 @@
   boot_val => 'true',
 },
 
+{ name => 'enable_indexagg', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+  short_desc => 'Enables the planner\'s use of index aggregation plans.',
+  flags => 'GUC_EXPLAIN',
+  variable => 'enable_indexagg',
+  boot_val => 'true',
+},
+
 { name => 'enable_indexonlyscan', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
   short_desc => 'Enables the planner\'s use of index-only-scan plans.',
   flags => 'GUC_EXPLAIN',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index dc9e2255f8a..307b9ee660d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -410,6 +410,7 @@
 #enable_hashagg = on
 #enable_hashjoin = on
 #enable_incremental_sort = on
+#enable_indexagg = on
 #enable_indexscan = on
 #enable_indexonlyscan = on
 #enable_material = on
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index b72f000d2ac..6eed0075c24 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -3518,7 +3518,8 @@ typedef struct JoinPathExtraData
  */
 #define GROUPING_CAN_USE_SORT       0x0001
 #define GROUPING_CAN_USE_HASH       0x0002
-#define GROUPING_CAN_PARTIAL_AGG	0x0004
+#define GROUPING_CAN_USE_INDEX		0x0004
+#define GROUPING_CAN_PARTIAL_AGG	0x0008
 
 /*
  * What kind of partitionwise aggregation is in use?
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 07b8bfa6377..1e06a3e5c49 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_tidscan;
 extern PGDLLIMPORT bool enable_sort;
 extern PGDLLIMPORT bool enable_incremental_sort;
 extern PGDLLIMPORT bool enable_hashagg;
+extern PGDLLIMPORT bool enable_indexagg;
 extern PGDLLIMPORT bool enable_nestloop;
 extern PGDLLIMPORT bool enable_material;
 extern PGDLLIMPORT bool enable_memoize;
-- 
2.43.0

v4-0004-add-support-for-Partial-IndexAggregate.patchtext/x-patch; charset=UTF-8; name=v4-0004-add-support-for-Partial-IndexAggregate.patchDownload

From 504f0513a892de6fd5315e891dd7ce264db54f14 Mon Sep 17 00:00:00 2001
From: Sergey Soloviev <sergey.soloviev@tantorlabs.ru>
Date: Thu, 11 Dec 2025 14:30:37 +0300
Subject: [PATCH v4 4/5] add support for Partial IndexAggregate

Now IndexAggregate support partial aggregates. The main problem was with
partial aggregates which creates SortGroupClause for same expression as
in target list, but different sortgroupclause, so make_pathkeys_for_sortclases
failed to find required target list entry and throws ERROR.

To fix this now we explicitly pass pathkeys to create_agg_path (but only
for AGG_INDEX for now), so caller is responsible for searching and
building pathkeys list.
---
 src/backend/optimizer/path/allpaths.c  | 76 ++++++++++++++++++++
 src/backend/optimizer/plan/planner.c   | 98 ++++++++++++++++++++++++--
 src/backend/optimizer/prep/prepunion.c |  2 +
 src/backend/optimizer/util/pathnode.c  | 16 +++--
 src/include/optimizer/pathnode.h       |  1 +
 5 files changed, 185 insertions(+), 8 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6e641c146a3..d1397c2dc33 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3446,6 +3446,7 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
 	AggClauseCosts agg_costs;
 	bool		can_hash;
 	bool		can_sort;
+	bool		can_index;
 	Path	   *cheapest_total_path = NULL;
 	Path	   *cheapest_partial_path = NULL;
 	double		dNumGroups = 0;
@@ -3498,6 +3499,12 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
 	can_hash = (agg_info->group_clauses != NIL &&
 				grouping_is_hashable(agg_info->group_clauses));
 
+	/* 
+	 * Determine whether we should consider index-based implementations of
+	 * grouping.
+	 */
+	can_index = can_sort && can_hash;
+
 	/*
 	 * Consider whether we should generate partially aggregated non-partial
 	 * paths.  We can only do this if we have a non-partial path.
@@ -3615,6 +3622,7 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
 											AGGSPLIT_INITIAL_SERIAL,
 											agg_info->group_clauses,
 											NIL,
+											NIL,
 											&agg_costs,
 											dNumGroups);
 
@@ -3691,6 +3699,7 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
 											AGGSPLIT_INITIAL_SERIAL,
 											agg_info->group_clauses,
 											NIL,
+											NIL,
 											&agg_costs,
 											dNumPartialGroups);
 
@@ -3727,6 +3736,7 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
 										AGGSPLIT_INITIAL_SERIAL,
 										agg_info->group_clauses,
 										NIL,
+										NIL,
 										&agg_costs,
 										dNumGroups);
 
@@ -3762,6 +3772,72 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
 										AGGSPLIT_INITIAL_SERIAL,
 										agg_info->group_clauses,
 										NIL,
+										NIL,
+										&agg_costs,
+										dNumPartialGroups);
+
+		add_partial_path(grouped_rel, path);
+	}
+
+	if (can_index && cheapest_total_path != NULL)
+	{
+		Path	   *path;
+
+		/*
+		 * Since the path originates from a non-grouped relation that is
+		 * not aware of eager aggregation, we must ensure that it provides
+		 * the correct input for partial aggregation.
+		 */
+		path = (Path *) create_projection_path(root,
+											   grouped_rel,
+											   cheapest_total_path,
+											   agg_info->agg_input);
+		/*
+		 * qual is NIL because the HAVING clause cannot be evaluated until the
+		 * final value of the aggregate is known.
+		 */
+		path = (Path *) create_agg_path(root,
+										grouped_rel,
+										path,
+										agg_info->target,
+										AGG_INDEX,
+										AGGSPLIT_INITIAL_SERIAL,
+										agg_info->group_clauses,
+										NIL,
+										group_pathkeys,
+										&agg_costs,
+										dNumGroups);
+
+		add_path(grouped_rel, path);
+	}
+
+	if (can_index && cheapest_partial_path != NULL)
+	{
+		Path	   *path;
+
+		/*
+		 * Since the path originates from a non-grouped relation that is not
+		 * aware of eager aggregation, we must ensure that it provides the
+		 * correct input for partial aggregation.
+		 */
+		path = (Path *) create_projection_path(root,
+											   grouped_rel,
+											   cheapest_partial_path,
+											   agg_info->agg_input);
+
+		/*
+		 * qual is NIL because the HAVING clause cannot be evaluated until the
+		 * final value of the aggregate is known.
+		 */
+		path = (Path *) create_agg_path(root,
+										grouped_rel,
+										path,
+										agg_info->target,
+										AGG_INDEX,
+										AGGSPLIT_INITIAL_SERIAL,
+										agg_info->group_clauses,
+										NIL,
+										group_pathkeys,
 										&agg_costs,
 										dNumPartialGroups);
 
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 8cdbf919902..337ac20b983 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3889,6 +3889,7 @@ create_grouping_paths(PlannerInfo *root,
 		if (   gd == NULL
 			&& root->numOrderedAggs == 0
 			&& parse->groupClause != NIL
+			&& parse->groupingSets == NIL
 			&& grouping_is_sortable(root->processed_groupClause)
 			&& grouping_is_hashable(root->processed_groupClause))
 			flags |= GROUPING_CAN_USE_INDEX;
@@ -5031,6 +5032,7 @@ create_partial_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 										 AGGSPLIT_SIMPLE,
 										 root->processed_distinctClause,
 										 NIL,
+										 NIL,
 										 NULL,
 										 numDistinctRows));
 	}
@@ -5239,6 +5241,7 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 								 AGGSPLIT_SIMPLE,
 								 root->processed_distinctClause,
 								 NIL,
+								 NIL,
 								 NULL,
 								 numDistinctRows));
 	}
@@ -7210,6 +7213,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 											 AGGSPLIT_SIMPLE,
 											 info->clauses,
 											 havingQual,
+											 NIL,
 											 agg_costs,
 											 dNumGroups));
 				}
@@ -7281,6 +7285,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 												 AGGSPLIT_FINAL_DESERIAL,
 												 info->clauses,
 												 havingQual,
+												 NIL,
 												 agg_final_costs,
 												 dNumFinalGroups));
 					else
@@ -7322,6 +7327,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 									 AGGSPLIT_SIMPLE,
 									 root->processed_groupClause,
 									 havingQual,
+									 NIL,
 									 agg_costs,
 									 dNumGroups));
 		}
@@ -7341,6 +7347,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 									 AGGSPLIT_FINAL_DESERIAL,
 									 root->processed_groupClause,
 									 havingQual,
+									 NIL,
 									 agg_final_costs,
 									 dNumFinalGroups));
 		}
@@ -7348,10 +7355,10 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 
 	if (can_index)
 	{
-		/* 
-		 * Generate IndexAgg path.
-		 */
-		Assert(!parse->groupingSets);
+		List *pathkeys = make_pathkeys_for_sortclauses(root,
+													   root->processed_groupClause,
+													   root->processed_tlist);
+
 		add_path(grouped_rel, (Path *)
 				 create_agg_path(root,
 								 grouped_rel,
@@ -7361,8 +7368,29 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 								 AGGSPLIT_SIMPLE,
 								 root->processed_groupClause,
 								 havingQual,
+								 pathkeys,
 								 agg_costs,
 								 dNumGroups));
+
+		/*
+		 * Instead of operating directly on the input relation, we can
+		 * consider finalizing a partially aggregated path.
+		 */
+		if (partially_grouped_rel != NULL)
+		{
+			add_path(grouped_rel, (Path *)
+					 create_agg_path(root,
+									 grouped_rel,
+									 cheapest_partially_grouped_path,
+									 grouped_rel->reltarget,
+									 AGG_INDEX,
+									 AGGSPLIT_FINAL_DESERIAL,
+									 root->processed_groupClause,
+									 havingQual,
+									 pathkeys,
+									 agg_final_costs,
+									 dNumFinalGroups));
+		}
 	}
 
 	/*
@@ -7411,6 +7439,7 @@ create_partial_grouping_paths(PlannerInfo *root,
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+	bool		can_index = (extra->flags & GROUPING_CAN_USE_INDEX) != 0;
 
 	/*
 	 * Check whether any partially aggregated paths have been generated
@@ -7562,6 +7591,7 @@ create_partial_grouping_paths(PlannerInfo *root,
 											 AGGSPLIT_INITIAL_SERIAL,
 											 info->clauses,
 											 NIL,
+											 NIL,
 											 agg_partial_costs,
 											 dNumPartialGroups));
 				else
@@ -7620,6 +7650,7 @@ create_partial_grouping_paths(PlannerInfo *root,
 													 AGGSPLIT_INITIAL_SERIAL,
 													 info->clauses,
 													 NIL,
+													 NIL,
 													 agg_partial_costs,
 													 dNumPartialPartialGroups));
 				else
@@ -7651,6 +7682,7 @@ create_partial_grouping_paths(PlannerInfo *root,
 								 AGGSPLIT_INITIAL_SERIAL,
 								 root->processed_groupClause,
 								 NIL,
+								 NIL,
 								 agg_partial_costs,
 								 dNumPartialGroups));
 	}
@@ -7669,6 +7701,62 @@ create_partial_grouping_paths(PlannerInfo *root,
 										 AGGSPLIT_INITIAL_SERIAL,
 										 root->processed_groupClause,
 										 NIL,
+										 NIL,
+										 agg_partial_costs,
+										 dNumPartialPartialGroups));
+	}
+	
+	/*
+	 * Add a partially-grouped IndexAgg Path where possible
+	 */
+	if (can_index && cheapest_total_path != NULL)
+	{
+		List *pathkeys;
+
+		/* This should have been checked previously */
+		Assert(parse->hasAggs || parse->groupClause);
+		
+		pathkeys = make_pathkeys_for_sortclauses(root,
+												 root->processed_groupClause,
+												 root->processed_tlist);
+
+		add_path(partially_grouped_rel, (Path *)
+				 create_agg_path(root,
+								 partially_grouped_rel,
+								 cheapest_total_path,
+								 partially_grouped_rel->reltarget,
+								 AGG_INDEX,
+								 AGGSPLIT_INITIAL_SERIAL,
+								 root->processed_groupClause,
+								 NIL,
+								 pathkeys,
+								 agg_partial_costs,
+								 dNumPartialGroups));
+	}
+
+	/*
+	 * Now add a partially-grouped IndexAgg partial Path where possible
+	 */
+	if (can_index && cheapest_partial_path != NULL)
+	{
+		List *pathkeys;
+
+		/* This should have been checked previously */
+		Assert(parse->hasAggs || parse->groupClause);
+
+		pathkeys = make_pathkeys_for_sortclauses(root,
+												 root->processed_groupClause,
+												 root->processed_tlist);
+		add_partial_path(partially_grouped_rel, (Path *)
+						  create_agg_path(root,
+										 partially_grouped_rel,
+										 cheapest_partial_path,
+										 partially_grouped_rel->reltarget,
+										 AGG_INDEX,
+										 AGGSPLIT_INITIAL_SERIAL,
+										 root->processed_groupClause,
+										 NIL,
+										 pathkeys,
 										 agg_partial_costs,
 										 dNumPartialPartialGroups));
 	}
@@ -8830,6 +8918,7 @@ create_final_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
 										AGGSPLIT_SIMPLE,
 										groupClause,
 										NIL,
+										NIL,
 										NULL,
 										unique_rel->rows);
 
@@ -8972,6 +9061,7 @@ create_partial_unique_paths(PlannerInfo *root, RelOptInfo *input_rel,
 										AGGSPLIT_SIMPLE,
 										groupClause,
 										NIL,
+										NIL,
 										NULL,
 										partial_unique_rel->rows);
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 78c95c36dd5..56b2c49d455 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -949,6 +949,7 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 											AGGSPLIT_SIMPLE,
 											groupList,
 											NIL,
+											NIL,
 											NULL,
 											dNumGroups);
 			add_path(result_rel, path);
@@ -965,6 +966,7 @@ generate_union_paths(SetOperationStmt *op, PlannerInfo *root,
 												AGGSPLIT_SIMPLE,
 												groupList,
 												NIL,
+												NIL,
 												NULL,
 												dNumGroups);
 				add_path(result_rel, path);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 93363eaba34..d28e50ee02d 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2988,6 +2988,7 @@ create_unique_path(PlannerInfo *root,
  * 'aggsplit' is the Agg node's aggregate-splitting mode
  * 'groupClause' is a list of SortGroupClause's representing the grouping
  * 'qual' is the HAVING quals if any
+ * 'pathkeys' for AGG_INDEX must be a list of PathKey used by this agg node
  * 'aggcosts' contains cost info about the aggregate functions to be computed
  * 'numGroups' is the estimated number of groups (1 if not grouping)
  */
@@ -3000,6 +3001,7 @@ create_agg_path(PlannerInfo *root,
 				AggSplit aggsplit,
 				List *groupClause,
 				List *qual,
+				List *pathkeys,
 				const AggClauseCosts *aggcosts,
 				double numGroups)
 {
@@ -3033,11 +3035,17 @@ create_agg_path(PlannerInfo *root,
 	else if (aggstrategy == AGG_INDEX)
 	{
 		/* 
-		 * When using index aggregation all grouping columns will be used as
-		 * comparator keys, so output is always sorted.
+		 * For IndexAgg we also must know used ordering just like for GroupAgg,
+		 * but for the latter this information is passed by child node, i.e.
+		 * Sort. But here we can not use make_pathkeys_for_sortclauses, because
+		 * in case of partial aggregates the node will contain different target
+		 * list and sortgroupref indexes, so this function will not find required
+		 * entries. So caller must build pathkeys for us.
+		 * 
+		 * NOTE: pathkeys CAN be NIL, i.e. if planner decided that all values
+		 * are same constant.
 		 */
-		pathnode->path.pathkeys = make_pathkeys_for_sortclauses(root, groupClause,
-																root->processed_tlist);
+		pathnode->path.pathkeys = pathkeys;
 	}
 	else
 		pathnode->path.pathkeys = NIL;	/* output is unordered */
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 9b2c4b3e7ef..ecf376057c1 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -235,6 +235,7 @@ extern AggPath *create_agg_path(PlannerInfo *root,
 								AggSplit aggsplit,
 								List *groupClause,
 								List *qual,
+								List *pathkeys,
 								const AggClauseCosts *aggcosts,
 								double numGroups);
 extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
-- 
2.43.0